(19) 



J 



(12) 



Europaisches Patentamt 
European Patent Office 
Office europeen des brevets 



(11) EP 0 902 355 A2 

EUROPEAN PATENT APPLICATION 



(43) 


Date of publication: 


(51) Intel 6; G06F 3/14 




17.03.1999 Bulletin 1999/11 






(21) 


Applicatbn number: 98307097.0 






(22) 


Date of filing: 03.09.1998 






(84) 


Designated Contracting States: 


(72) 


Inventors: 




AT BE CH CY DE DK ES Fl FR GB GR IE IT LI LU 


• 


Santos, Gregory N. 




MC NL PT SE 




Cypress, Texas 77429 (US) 




Designated Extension States: 


• 


Elliott, Robert C. 




AL LT LV MK RO SI 




Houston, Texas 77069 (US) 


(30) 


Priority: 09.09.1997 US 926421 


(74) 


Representative: Brunner, Michael John et al 




GILL JENNINGS & EVERY 


(71) 


Applicant: Compaq Computer Corporation 




Broadgate House 




Houston Texas 77070 (US) 




7 Eldon Street 








London EC2M 7LH (GB) 



CM 
< 

lO 
lO 
CO 

CN 

o 

a> 

o 

Q. 
LU 



(54) System and method for invalidating and updating individual gart (graphic address 
remapping table) entries for accelerated graphics port transaction requests 



(57) A computer system having a core logic chipset 
that functions as a bridge between an Accelerated 
Graphics Port ("AGP") bus device such as a graphics 
controller and a host processor and computer system 
memory wherein a Graphics Address Remapping Table 
("GART tabl^JJ^sed by the core logic chipset to remap 
virtual memory acjdresses used by the AGP graphics 
controller into physteal memory addresses that reside 
in the computer system memory. The GART table ena- 
bles the AGP graphics controller to work in contiguous 
virtual memory address space, but actually use non- 
contiguous blocks or pages of physical system memory 
to store textures, command lists and the like. The GART 
table is made up of a plurality of entries, each entry com- 
prising an address pointer to a base address of a page 
of graphics data in memory, and feature flags that may 
be used to customize the associated page. The core log- 
ic chipset may cache a subset of the most recently used 
GART table entries to increase AGP performance when 
performing the address translation . A GART cache entry 
control register is used by an application programming 
interface, such as a GART miniport driver to indicate to 
the core logic chipset that an individual GART table en- 
try in the chipset cache should be invalidated and/or up- 
dated. The core logic chipset may then perform the re- 
quired invalidate and/or update operation on the individ- 
ual GART table entry without having to flush or other- 
wise disturb the other still relevant GART table entries 
stored in the cache. 
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Description v 

[0001] The present invention relates to computer systems using a bus bridge(s) to interface a central processor(s). 
video graphics processor(s), random access memory and input-output peripherals together, and more particularly, in 

5 utilizing a graphics address remapping table (GART table) for remapping non -contiguous physical memory pages into 
contiguous accelerated graphics port (AGP) device addresses, wherein selected entries of the GART table are cached 
to speed up the remapping process and when a GART table entry in the cache is no longer valid or needs to be updated, 
a mechanism is used to mark a particular GART table entry without affecting other GART table entries in the cache. 
[0002] Use of computers, especially persona! computers, m business and at home is becoming more and more 

10 pervasive because the computer has become an integral tool of most information workers who work in the fields of 
accounting law. engineering, insurance, services, sales and the like. Rapid technological improvements in the field of 
computers have opened up many new applications heretofore unavailable or too expensive for the use of older tech- 
nology mainframe computers. These personal computers may be stand-alone workstations (high end individual per- 
sonal computers), desk-top personal computers, portable lap-top computers and the like, or they may be linked together 

'5 in a network by a "network server' which is also a'personal computer which may have a few additional features specific 
to Its purpose in the network. The network server may be used to store massive amounts of data, and may facilitate 
inioraction of the individual workstations connected to the network for electronic mail ("E-mail"), document databases, 
video teleconferencing, v/hite boarding: integrated enterprise calendar, virtual engineering design and the like. Multiple 
network servers may also be interconnected by local area networks ("LAN") and wide area networks ("WAN"). 

20 [0003] A significant part of the ever increasing popularity of the personal computer, besides its low cost relative to 
just a few years ago, is its ability to run sophisticated programs and perform many useful and new tasks. Personal 
computers today may be easily upgraded with new peripheral devices for added flexibility and enhanced performance. 
A major advance in the performance of personal computers (both workstation and network servers) has been the 
implementation of sophisticated peripheral devices such as video graphics adapters, local area network interfaces, 

25 SCSI bus adapters, full motion video, redundant error checking and correcting disk arrays, and the like. These sophis- 
ticated peripheral devices are capable of data transfer rates approaching the native speed of the computer system 
microprocessor central processing unit ("GPU"). The peripheral devices' data transfer speeds are achieved by con- 
necting the peripheral devices to the microprccessor(s) and associated system random access memory through high 
speed expansion local buses. Most notably, a high speed expansion local bus standard has emerged that is micro- 

30 processor independent and has been embraced by a significant number of peripheral hardware manufacturers and 
software programmers. This high speed expansion bus standard is called the "Peripheral Component Interconnect" 
or "PCI." A more complete definition of the PCI local bus may be found in the PCI Local Bus Specification, revision 
2.1 : PCI/PCI Bridge Specification, revision 1 .0; PCI System Design Guide, revision 1 .0; PCI BIOS Specification, revision 
2.1 , and Engineering Change Notice ("ECN") entitled "Addition of 'New Capabilities' Structure," dated May 20. 1996, 

35 the disclosures of which are hereby incorporated by reference. These PCI specifications and ECN are available from 
the PCI Special Interest Group, PO. Box 14070. Portland, OR 97214. 

[0004] A computer system has a plurality of information (data and address) buses such as a host bus, a memory 
bus, at least one high speed expansion local bus such as the PCI bus, and other peripheral buses such as the Small 
Computer System Interface (SCSI), Extension to Industry Standard Architecture (EISA), and Industry Standard Archi- 

^0 tecture (ISA). The microprocessor(s) of the computer system communicates with main memory and with the peripherals 
that make up the computer system over these various buses. The microprocessor(s) communicates to the main memory 
over a host bus to memory bus bridge. The peripherals, depending on their data transfer speed requirements, are 
connected to the various buses which are connected to the microprocessor host bus through bus bridges that detect 
required actions, arbitrate, and translate both data and addresses between the various buses. 

-^5 [0005] Increasingly sophisticated microprocessors have revolutionized the role of the personal computer by enabling 
complex applications software to run at mainframe computer speeds. The latest microprocessors have brought the 
level of technical sophistication to personal computers that, just a few years ago, was available only in mainframe and 
mini-computer systems. Some representative examples of these new microprocessors are the "PENTIUM" and "PEN- 
TIUM PRO" (registered trademarks of Intel Corporation). Advanced microprocessors are also manufactured by Ad- 

50 vanced Micro Devices, Cyrix, IBM, Digital Equipment Corp., and Motorola. 

[0006] These sophisticated microprocessors have, in turn, made possible running complex application programs 
using advanced three dimensional ("3-D") graphics for computer aided drafting and manufacturing, engineering sim- 
ulations, games and the like. Increasingly complex 3-D graphics require higher speed access to ever larger amounts 
of graphics data stored in memory. This memory may be part of the video graphics processor system, but, preferably, 

55 would be best (lowest cost) if part of the main computer system memory. Intel Corporation has proposed a k^w cost 
but improved 3-D graphics standard called the ' Accelerated Graphics Port" (AGP) initiative. With AGP 3-D, graphics 
data, in particular textures, may be shifted out of the graphics controller local memory to computer system memory. 
The computer system memory is lower in cost than the graphics controller local memory and is more easily adapted 
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for a multitude of other uses besides storing graphics data. 

[0007] The proposed Intel AGP 3-D graphics standard defines a high speed data pipeline, or "AGP bus." between 
the graphics controller and system memory. This AGP bus has sufficient bandwidth for the graphics controller to retrieve 
textures from system memory without materially affecting computer system performance for other non-graphics oper- 
5 ations. The Intel 3-D graphics standard is a specification which provides signal, protocol, electrical, and mechanical 
specifications for the AGP bus and devices attached thereto. This specification is entitled "Accelerated Graphics Port 
Interface Specification Revision 1 .0." dated July 31, 1996, the disclosure of which is hereby incorporated by reference. 
The AGP Specification is available from Intel Corporation, Santa Clara, California. 

[0008] The AGP Specification uses the 66 MHz PCI (Revision 2.1) Specification as an operational baseline, with 

10 three performance enhancements to the PCI Specification which are used to optimize the AGP Specification for high 
performance 3-D graphics applications. These enhancements are: 1 ) pipelined memory read and write operations, 2) 
demultiplexing of address and data on the AGP bus by use of sideband signals, and 3) data transfer rates of 133 f^Hz 
for data throughput in excess of 500 megabytes per second ("f^B/s"). The remaining AGP Specification does not modify 
the PCI Specification, but rather provides a range of graphics-oriented performance enhancements for use by 3-D 

IS graphics hardware and software designers, the AGP Specification is neither meant to replace nor diminish full use of 
the PCI standard in the computer system. The AGP Specltlcatbn creates an independent and additional high speed 
local bus for use by 3-D graphics devices such as a graphics controller, wherein the other input-output ("I/O") devices 
of the computer system may remain on any combination of the PCI, SCSI, EISA and ISA buses. 
[0009] To functbnally enable this AGP 3-D graphics bus, new computer system hardware and software are required. 

20 This requires new computer system core logic designed to (unction as a host bus/memory bus/PCI bus to AGP bus 
bridge meeting the AGP Specification, and new Read Only Memory Basic Input Output System ("ROM BIOS") and 
Application Programming Interface ("API") software to make the AGP dependent hardware functional In the computer 
system. The computer system core logic must still meet the PCI standards referenced above and facilitate interfacing 
the PCI bus(es) to the remainder of the computer system. In addition, new AGP compatible device cards must be 

25 designed to properly interface, mechanically and electrically, with the AGP bus connector. 

[0010] AGP and PCI device cards are neither physically nor electrically interchangeable even though there is some 
commonality of signal functions between the AGP and PCI interface specifications. The present AGP Specification 
only makes allowance for a single AGP device on an AGP bus, whereas, the PCI Specification allows two plug-in slots 
for PCI devices plus a bridge on a PCI bus running at 66 t^Hz. The single AGP device is capable of functioning in both 

30 a Ix mode (264 MB/s peak) and a 2x mode (532 MB/s peak). The AGP bus is defined as a 32 bit bus. and may have 
up to four bytes of data transferred per clock in the lx mode and up to eight bytes of data per clock in the 2x mode. 
The PCI bus is defined as either a 32 bit or 64 bit bus. and may have up to four or eight bytes of data transferred per 
clock, respectively. The AGP bus, however, has additional sideband signals which enables it to transfer blocks of data 
more efficiently than is possible using a PCI bus. An AGP bus running in the 2x mode provides sufficient video data 

35 throughput (532 MB/s peak) to allow increasingly complex 3-D graphics applications to run on personal computers. 
[001 1] A major performance/cost enhancement using AGP in a computer system is accomplished by shifting texture 
data structures from local graphics memory to main memory. Textures are ideally suited for this shift for several reasons. 
Textures are generally read-only, and therefore problems of access ordering and coherency are less likely to occur. 
Shifting of textures serves to balance the bandwidth load between system memory and local graphics memory, since 

40 a well-cached host processor has much lower memory bandwidth requirements than does a 3-D rendering machine; 
texture access comprises perhaps the single largest component of rendering memory bandwidth, so avoiding loading 
or caching textures in local graphics memory saves not only this component of local memory bandwidth, but also the 
bandwidth necessary to load the texture store in the first place, and, further, this data must pass through main memory 
anyway as it is loaded from a mass store device. Texture size is dependent upon application quality rather than on 

45 display resolution, and therefore may require the greatest increase in memory as software applications become more 
advanced. Texture data is not persistent and may reside in the computer system memory only for the duration of the 
software application, so any system memory spent on texture storage can be returned to the free memory heap when 
the application concludes (unlike a graphic controller's local frame buffer which may remain in persistent use). For 
these reasons, shifting texture data from local graphics memory to main memory significantly reduces computer system 

so costs when implementing 3-D graphics. 

[001 2] Generally, in a computer system memory architecture the graphics controller's physical address space resides 
above the top of system memory. The graphics controller uses this physical address space to access its local memory 
which holds information required to generate a graphics screen. In the AGP system, information still resides in the 
graphics controller's local memory (textures, alpha, z-buffer. etc.). but some data which previously resided in this local 

55 memory is moved to system memory (primarily textures, but also command lists, etc.). The address space employed 
by the graphics controller to access these textures becomes virtual, meaning that the physical memory corresponding 
to this address space doesn't actually exist above the top of memory. In reality, each of these virtual addresses corre- 
sponds to a physical address in system memory The graphics controller sees this virtual address space, referenced 



BNSDOCIO <EP 0902355A2J > 



EP 0 902 355 A2 



hereinafter as 'AGP device address space," as one contiguous block of memory, but the corresponding physicari mem- 
ory addresses may be allocated in 4 kilobyte ("KB"), non-contiguous pages throughout the computer system physical 
memory. 

[001 3] There are two primary AGP usage models for 3D rendering, that have to do with how data are partitioned and 
5 accessed, and the resultant interface data flow characteristics. In the "DMA" model, the primary graphics memory is 
a local memory referred to as 'local frame buffer' and is associated with the AGP graphics controller or "video accel- 
erator." 3D structures are stored in system memory, but are not used (or "executed") directly from this memory; rather 
they are copied to primary (local) memory, to which the rendering engine's address generator (of the AGP graphics 
controller) makes references thereto. This implies that the traffic on the AGP bus tends to be long, sequential transfers, 
10 sen/ing the purpose of buiK data transport from system memory to primary graphics (local) memory. This sort of access 
model is amenable to a linked list of physical addresses provided by software (similar to operation of a disk or network 
I/O device), and is generally not sensitive to a non-contiguous view of the memory space. 

[0014] In the "execute" model, the video accelerator uses both the local memory and the system memory as primary 
graphics memory. From the accelerator's perspective, the two memory systems are logically equivalent; any data 

is structure may be allocated in either memory, with performance optimization as the only criteria for selection. In general 
structures in system memory space are not copied into the local memory prbr to use by the video accelerator, but are 
"executed" in place. This implies that the traffic on the AGP bus tends to be short, random accesses, which are not 
amenable to an access model based on software resolved lists of physical addresses. Since the accelerator generates 
direct references into system memory, a contiguous view of that space is essential. But. since system memory is 

20 dynamically allocated in, for example, random 4,096 byte bkx;ks of the memory, hereinafter 4 kilobyte ("KB") pages, 
it is necessary in the "execute" model to provide an address mapping mechanism that maps the random 4 KB pages 
into a single contiguous address space. 

[001 5] The AGP Specification, incorporated by reference hereinabove, supports both the "DMA" and "execute" mod- 
els. However, since a primary motivation of the AGP is to reduce growth pressure on the graphics controller's local 

25 memory (including local frame buffer memory), the "execute" model is preferred. Consistent with this preference, the 
AGP Specification requires a virtual-to-physicat address re-mapping mechanism which ensures the graphics acceler- 
ator (AGP master) will have a contiguous view of graphics data structures dynamically allocated in the system memory. 
This address re-mapping applies only to a single, programmable rang© of the system physical address space and is 
common to all system agents. Addresses falling in this range are re-mapped to non -contiguous pages of physical 

30' system memory. All addresses not in this range are passed through without modification, and map directly to main 
system memory, or to device specific ranges, such as a PCI device's physical memory. Re-mapping is accomplished 
via a "Graphics Address Remapping Table" ("GART table") which is set up and maintained by a GART miniport driver 
software, and used by the core logic chipset to perform the re-mapping. In order to avoid compatibility issues and allow 
future implementation flexibility this mechanism is specified at a software (API) level. In other words, the actual GART 

35 table format may be abstracted to the API by a hardware abstraction layer ("HAL") or mini-port driver that is provided 
with the core bgic chipset. While this API does not constrain the future partitioning of re-mapping hardware, the re- 
mapping function will typically be implemented in the core logic chipset. 

[0016] The contiguous AGP graphics controller's device addresses are mapped (translated) into corresponding phys- 
ical addresses that reside in the computer system physical memory by using the GART table which may also reside 

40 in physical memory. The GART table is used by the core logic chipset to remap AGP device addresses that can originate 
from either the AGR host, or PCI buses. The GART table is managed by a software program called a "GART miniport 
driver." The GART miniport driver provides GART services for the computer software operating system. 
[0017] Residing in the system memory, the GART table may be read from and/or written to by the core logic driver 
software, i.e. the aforementioned GART miniport driver, or any other software program or application specific interface 

45 ("API") program. The GART table is used by the computer system core logic to remap the virtual addresses of the 
graphics data requested by the AGP graphics controller to physical addresses of pages that reside in the computer 
system memory (translate addresses). Thus, the AGP graphics controller can work In contiguous virtual address space, 
but use non-contiguous pages of physical system memory to store graphics data such as textures and the tike. 
[0018] Typically the core logic will cache a subset of the most recently used GART table entries to increase system 

50 performance when doing the address translations. These cached GART table entries, however, may become stale (invalid) 
due to the corresponding GART table entries in the system memory being subsequently updated when, for example, the 
GART miniport driver receives a call by a graphics applications program to allocate or de-allocate a page(s) of graphics 
data in the system memory which requires the corresponding GART table entry to be updated. Updating of the GART 
table entries in system memory typically requires the core logic to snoop all of the write accesses to the GART table. This 

55 requires comparison logic to be implemented in the core logic. The core logic may also implement a programmable register 
to invalidate all of the GART table entries stored in its cache each time the GART miniport driver updates the GART table 
in system memory. Implementing comparison logic in the core logic chipset for snooping can be difficult and drive the 
gate count up which increases the cost and complexity of the core logic chipset. A global invalidation will flush cached 
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GART table entries which do not require invalidation, thus leading to degradation of AGP bus performance when address 
translation is required for an AGP transaction request. What is needed is a way of invalidating and/or updating a individual 
stale GART table entry cached in the core logic chipset without affecting the other cached GART table entries that are 
still valid. 

5 [0)19] It is therefore an object of the present inventbn to invalidate and/or update an individual GART table entry. 
[0020] Another object of the present invention is to reduce the logic required in a core logic chipset having a GART 
table address translation and GART cache. 

[0021] Another object is to eliminate the requirement tor snooping writes to the GART table in physical memory. 
[0)22] Still another object is to mark a cached GART table entry as invalid when its corresponding page of graphics 
TO data in physical memory has been un-mapped. 

[0023] Yet another object is to mark a cached GART table entry to be updated when its corresponding page of 
graphics data in physical memory has been re-mapped. 

[0024] Still another object is to prefetch updated GART table entries from physical memory to the core logic chipset 
cache. 

IS [0025] The above and other objects of the present invention are satisfied, at least in part, by providing in a computer 
system a core logic chipset that functions as a bridge between an AGP bus and host and mennory buses wherein a 
"Graphics Address Remapping Table" ("GART table") is used by the core logic chipset to remap virtual addresses into 
physical addresses that reside in the computer system memory. Entries of the GART table may also reside in the 
computer system memory. The core logic chipset uses the GART table entries so that an AGP graphics controller may 

20 reference addresses of graphics information in contiguous virtual address space, hereinafter "AGP device address 
space." but actually have the graphics information stored in non -contiguous blocks of the computer system physical 
memory. The graphics information may be textures, command lists and the like. The core logic chipset of the present 
invention caches the necessary GART table entries in order to speed up retrieval of the graphics data from the computer 
system memory. 

25 [0026] The GART table is made up of a plurality of entries. A GART miniport driver creates the entries in the computer 
system memory that make up the GART table. Each of these entries comprise a translation pointer which references 
the physical address of the first byte of a page in physical memory, and feature flags associated with the referenced 
page. Each page in physical memory referenced by the GART table contains AGP graphics textures. The feature flags 
may be used to customize each associated page of menrory referenced by the pointer address. For example.-^a page 

30 in physical memory may contain 4.096 bytes (4 KB) of data such as textures, command lists and the like. The GART 
table entry may comprise four eight bit bytes for a total of 32 bits of binary information. If the 20 most significant bits 
(31:12) in the GART table entry are used for the physical memory page address, the 12 least significant bits (11:0) are 
available for use by the systems designer in defining and/or customizing certain features and attributes associated 
with the memory page. " 

35 [0027] Some examples for use of these feature flags (least significant bits) are as follows: 1 ) a Cacheability Bit may 
indicate whether the 4 KB page is cacheable, 2) a Write Combinable Bit may indicate whether the 4 KB page is write 
combinabte, 3) a Dirty Bit may indicate whether the page has been modified, 4) a Link Bit may indicate whether the 
next GART table entry is associated with the current GART table entry, and 5) a Present Bit may indicate whether the 
page referenced by the GART table entry (bits 31 : 12) is reserved by the GART miniport driver, i.e., the page is resen/ed 

40 in physical memory. A/lany other combinations of these feature bits may be utilized and are contemplated herein. These 
feature bits (11:0) rriay also be referred to hereinafter as "flag bits" and are typically managed by the GART miniport 
driver, but may be accessed by any other device driver of the computer system (i.e., ROM BIOS, etc.) because the 
GART table entries, typically, are located in the computer system memory. The core logic chipset of the present inven- 
tion may cache the necessary GART table entries in order to speed up retrieval of the graphics data pages from the 

45 computer system memory and translation thereof to the AGP device address space. 

[0028] In an embodiment of the present inventbn, the core logic chipset comprises a cache memory to store selected 
ones of the GART table entries if a single-level address remapping is implemented, and, in addition, selected ones of 
a GART directory if a two-level GART address remapping is implemented. Each cache entry stores a selected one of 
the GART table entries and is referenced to a page boundary of a linear address of the AGP device address space. 

50 The selected one of the GART table entry stored in the cache has the base address of the page of graphics data stored 
in physical memory and flag bits described hereinabove. Register logic is also provided to associate a "GART Cache 
Entry Update bit" and a GART Cache Entry Invalidate bit" for each of the cached GART table entries. 
[0029] AGP memory-mapped control registers of the core logic chipset are accessed via a base address register 
residing in a host-to-PCl bridge configuration header. A base address is determined and written to the base address 

55 register by the system BIOS during POST The AGP memory-mapped control registers are used by the GART miniport 
driver to dynamically control AGP functionality within the chipset during operation of the computer system. Preferably 
a GART Cache Entry Control Register is used by software, such as the GART miniport driver, to update/invalidate a 
specific GART cache entry. When the GART miniport driver receives a call to update/invalidate erMries in the GART 
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table (located in the system physical memory), it is also required to maintain coherency of the GART table entries in 
cache. If the update/invalidate GART table entry is not present in the GART cache, the invalidate function will have no 
effect. If a cached GART table entry needs to be updated, the present invention may prefetch the new GART table 
entry from system physical memory and thus, further speed address translations. 

5 ' [0030] The GART Cache Entry Control Register may be 32 bits wide (double word) and comprises a GART Entry 
Offset having a plurality of bits (31 :1 2) which define the AGP device address of the particular GART table entry to be 
invalidated/updated. The GART miniport driver derives this device address from the linear address (Lin-to-Dev com- 
mand). When a device address is written to this register by the GART miniport driver, the chipset invalidates/updates 
the referenced cache entry based upon the appropriate setting in the GART Cache Entry Update and/or GART Cache 

10 Entry Invalidate bits (bits 1 and 0 respectively) as follows: When the GART Cache Entry Update bit is set to a logic 1 , 
the chipset updates the GART cache entry referenced by the GART Entry Offset bits 31:12 with the current entry in 
the GART table in systenn memory. The update function is performed following the write to this register. When the 
update operation is completed, the core logic chipset may reset this bit to 0. The GART miniport driver may poll this 
bit to verify completion of the update operation. When the GART Cache Entry Invalidate bit is set to a logic 1, the 

is chipset invalidates the GART cache entry referenced by the GART Entry Offset bits 31 : 1 2, if present in the GART table 
entry cache. The invalidate function may be performed immediately following the write to this register When the Inval- 
idate operation is completed, the core logic chipset may reset this bit to 0. The GART miniport driver may poll this bit 
to verify completion of the invalidate operation. The core logic chipset may also prefetch into cache the GART table 
entries marked to be updated. 

20 [0031] For example, a graphics application requests that Microsoft Corporation's DirectDraw API de-allocate a 32 
KB region of AGP memory for a texture. DirectDraw then issues a PageUnMap call to the AGP device driver (GART 
miniport driver) to unmap eight 4 KB pages in the GART table. The AGP device driver writes to the GART Cache Entry 
Control Register to invalidate only the eight GART table entries associated with the unmapped eight 4 KB pages of 
AGP memory. The core logic chipset of the present invention will determine if any of these eight GART table entries 

25 are cached and will invalidate them if present in the GART cache. 

[0032] Another example is when a graphics application requests the DirectDraw API to allocate a 32 KB region of 
AGP memory for a texture. DirectDraw then issues a PageMap call to the AGP device driver to remap eight 4 KB pages 
in the GART table. Due to previous AGP transaction activity, the GART cache contains the eight previous address 
remap values for each of theses eight pages. The AGP device driver writes to the GART Cache Entry Control Register 

30 to both invalidate and update the GART table entries currently in the GART cache. The invalidate function basically 
invalidates the current entry in the cache. The update function, however, may be used to cause the core logic chipset 
to prefetch the new GART table entry(ies) for anticipated future use when perfomning AGP transaction requests. This 
will improve overall AGP performance because the translated address will already be stored in the core logic cache 
when the AGP memory access begins. 

35 [0033] A feature of the present invention is that individual GART table entries may be invalidated and/or updated in 
a GART cache without affecting other GART table entries that are still current. 

[0034] An advantage of the present inventbn is no snoop logic need be associated with the core logic chipset. 
[0035] Another advantage is that the AGP device driver may control how many entries in the GART cache are in- 
validated each time DirectDraw makes a call to update an entry in the GART table residing In system physical memory. 
40 [0036] Still another advantage is preloading GART table entries from the system physical memory before an AGP 
memory transaction request is issued by the AGP graphics device. 

[0037] Oher and further objects, features and advantages will be apparent from the following description of presently 
preferred embodiments of the invention, given for the purpose of disclosure and taken in conjunction with the accom- 
panying drawings. 

45 

Figure 1 is a schematic block diagram of a prior art computer system; 

Figure 2 is a schematic block diagram o! a computer system according to the present invention; 

50 Figure 3 is a schematic functional bkx:k diagram of an embodiment of the present invention according to the 

computer system of Figure 2; 

Figures 4A-4C are schematic diagrams of a computer system memory map, a GART table in the computer system 
memory and a GART table entry, according to the present invention; 

55 

Figure 5 is a schematic functional block diagram and memory map according to the present invention; 
Figure 1 0A is a schematic diagram of a memory map of an AGP single-level address translation; 
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Figure 1 0B is a schematic diagram of a memory map of an AGP two-level address translation; 

Figure 11 A is a schematic functional block diagram of the AGP single-level address translation according to Figure 
10A; 

5 

Figure 11 B is a table of bits required for page offset in a single-level translation; 

Figure 11 C is a schematic flow diagram of single-level address remapping; 

10 Figure 12A is a schematic functional block diagram of the AGP two-level address translation according to Figure 

10B; 

Figure 12B is a table of bits required for directory and page offset in a two-level translation; 

15 Figure 1 2C is a schematic flow diagram of two-level address remapping; 

Figure 1 3 is a schematic diagram of a memory map of the GART table, according to the present invention; 

Figure 14 is a schematic diagram of a memory map of entries in a GART directory, a page of GART table entries 
20 and an AGP memory, according to the present invention; 

Figure 15 is a table of maximum GART table size versus size of AGP memory; 

Figure 1 6 is a schematic functional block diagram of the AGP logical architecture; 

2S 

Figure 17A is a schematic table of registers according to the AGP functional block diagram of Figure 16 and an 
embodiment of the present invention; 

Figures 17B and 17C are tables of a functional description of the bits used in the AGP registers of FiguVe 17A, 
30 according to the present invention; 

Figure 18A is a schematic table of registers according to the AGP functional block diagram of Figure 16 and an 
embodiment of the present invention: 

35 Figures 1 8B-1 8M are tables of a functional description of the bits used in the AGP registers of Figure ISA, according 

to the present invention; 

Figure 19A is a schematic table of memory-mapped registers according to the AGP functional block diagram of 
Figure 16 and an embodiment of the present invention; 

40 

Figures 1 9B-1 9N are tables of functional descriptions of the bits used in the AGP registers of Figure 1 9A, according 
to the present invention; 

Figure 20 is a schematic memory map of caching GART table entries, according to an embodiment of the present 
45 invention; 

Figure 21 is a schematic memory map of prefetching GART table entries, according to an embodiment of the 
present invention; 

50 Figure 22 A is a schematic table of AGP graphics controller configuration registers according to the AGP functional 

block diagram of Figure 16 and an embodiment of the present invention; 

Figures 22B-22E are tables of functional descriptions of the bits used in the AGP registers of Figure 1 6A, according 
to the present invention; 

55 

Figure 23 is a table of best, typical, and worst case latencies for AGP, according to the present invention; 
Figure 24 is a schematic functional block diagram of the AGP software architecture; 
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Figures 25A-25F are tables of software services provided by the GART miniport driver and 

Figures 26A and 26B are tables of software services available to the GART miniport driver. 

5 DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS 

[0038] The present invention provides a core logic chipset in a computer system which is capable of implementing 
a bridge between host processor and memory buses, an AGP bus adapted for an AGP device(s), and a PCI bus 
adapted for PCI devices. The AGP device may be a graphics controller which utilizes graphical data such as textures 

ro by addressing a contiguous virtual address space, hereinafter "AGP device address space." that is translated from 
non-contiguous memory pages located in the computer system physical memory by the core logic chipset. The core 
logic chipset utilizes a "Graphics Address Remapping Table" ("GART table") which may reside in a physical memory 
of the computer system, such as system random access memory, and may be controlled by the core logic chipset 
software drfver(s). The function of the GART table is to remap virtual addresses referenced by the AGP device to the 

15 physical addresses of the graphics information located in the computer system physical memory. Each entry of the 
GART table describes a first byte address location for a page ot physical memory. The page of physical memory may 
be 4,096 bytes (4 KB) in size. A GART table entry comprises a memory address translation pointer and software 
controllable feature flags (see Figure 1 3). These feature flags may be used to customize the associated page of physical 
memory. API software and miniport drivers may write to and/or read from these feature flags. 

20 [0039] For llluslrative purposes, the preferred embodiment of the present inventbn is described hereinafter for com- 
puter systems utilizing the Intel x86 microprocessor architecture and certain terms and references will be specific to 
those processor platforms. AGP and PCI are interface standards, however, that are hardware independent and may 
be utilized with any host computer designed for these interlace standards. It will be appreciated by those skilled in the 
art of computer systems that the present invention may be adapted and applied to any computer platform utilizing the 

25 AGP and PC! Specifications. 

[0040] The PCI specifications referenced above are readily available and are hereby incorporated by reference. The 
AGP Specification entitled "Accelerated Graphics Port Interface Specification Revision 1.0." dated July 31. 1996. as 
referenced above, is readily available from Intel Corporation, and is hereby incorporated by reference. Further definition 
and enhancement of the AGP Specification is more fully defined in "Compaq's Supplement to the 'Accelerated Graphics 

30 Port Interface Specification Version 1 O'." Revision 0.8, dated April 1 , 1 997. and is hereby incorporated by reference. 
Both of these AGP specifications were included as Appendices A and B in commonly owned co-pending U.S. Patent 
Application Serial No. 08/853,289; filed May 9, 1997, entitled "Dual Purpose Apparatus, Method and System for Ac- 
celerated Graphics Port and Peripheral Component Interconnect" by Ronald T Horan and Sompong Olarig, and which 
is hereby incorporated by reference. 

35 [0041] Referring now to the drawings, the details of preferred embodiments of the present invention are schematically 
illustrated. Like elements in the drawings will be represented by like numbers, and similar elements will be represented 
by like numbers with a different lower case tetter suffix. Referring now to Figure 2, a schematic block diagram of a 
computer system utilizing the present invention is illustrated. A computer system is generally indicated by the numeral 
200 and comprises a central processing unit(s) ("CPU") 102, core logic chipset 204. system random access memory 

40 ("RAM") 106, a video graphics controller 210. a local frame buffer 208, a video display 112, a PCl/SCSl bus adapter 
1 14, a PCI/EISA/ISA bridge 116, and a PCI/IDE controller 1 1 8. Single or multilevel cache memory (not illustrated) may 
also be included in the computer system 200 according to the current art of microprocessor computer systems. The 
CPU 102 may be a plurality of CPUs 102 in a symmetric or asymmetric multi-processor configuration. 
[0042] The CPU(s) 102 is connected to the core logic chipset 204 through a host bus 103. The system RAM 106 is 

' 45 connected to the core logic chipset 204 through a memory bus 1 05. The video graphics controller(s) 21 0 is connected 
to the core logic chipset 204 through an AGP bus 207. The PCI/SCS! bus adapter 114, PCI/EISA/ISA bridge 116. and 
PCI/IDE controller 118 are connected to the core logic chipset 204 through a primary PCI bus 109. Also connected to 
the PCI bus 109 are a network interface card ("NIC") 122 and a PCI/PCI bridge 124. Some of the PCI devices such 
as the NIC 122 and PCt/PCI bridge 124 may plug into PCI connectors on the computer system 200 motherboard (not 

50 illustrated). 

[0043] Hard disk 130 and tape drive 132 are connected to the PCI/SCSI bus adapter 114 through a SCSI bus 111. 
The NIC 122 is connected to a local area network 119. The PCI/EISA/ISA bridge 116 connects over an EISA/ISA bus 
113 to a ROM BIOS 140, non-volatile random access memory (NVRAM) 142, modem 120, and input-output controller 
1 26. The modem 1 20 connects to a telephone line 121. The input-output controller 1 26 interfaces with a keyboard 1 46, 
55 real time clock (RTC) 144, mouse 148, floppy disk drive ("FDD") 150. and serial/parallel ports 152, 154. The EISA/ISA 
bus 113 is a stower information bus than the PCI bus 109, but it costs less to interface with the EISA/ISA bus 113. 
[0044] Referring now to Figure 3. a schematic functional block diagram of the core togic chipset 204 of Figure 2, 
. -a-ccording to the present invention, is illustrated. The core logic chipset 204 functionally comprises CPU host bus 
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interface and queues 302, nnenaory interface and control 304. host/PCI bridge 306, AGP logic 318. and PCI/PCI bridge 
320. The AGP logic 318 connprises AGP arbiter 316, GART cache 322, AGP data and control 310, and AGP request/ 
reply queues 31 2. The CPU host bus interface and queues 302 connect to the host bus 103 and include interface logic 
for all data, address and control signals associated with the CPU(s) 102 of the computer system 200. Multiple CPUs 
5 102 and cache memory associated therewith (not illustrated) are contemplated and within the scope of the present 
invention. 

[0045] The CPU host bus interface and queues 302 interfaces with the host/PCI bridge 306 and memory interface 
and control 304 over a core logic bus 311 . The CPU host bus interface and queues 302 interfaces with the AGP logic 
318 over the core logic bus 311 . The memory interface and control 304 interfaces with the AGP logic 318 over a core 
10 logic bus 309. An advantage of having separate buses 309 and 31 1 is that concurrent bus operations may be performed 
thereover. For example, video data stored in system RAfvl 106, connected to the bus 105, may be transferring to the 
video graphics controller 210 (AGP device) on the AGP bus 207 while the CPU 102 on the host bus 103 is accessing 
an independent PCI device (i.e., NIC 122) on the PCI bus 109. 

[0046] The host bus interface and queues 302 allows ihe CPU 102 to pipeline cycles and schedule snoop accesses. 

IS The memory interface and control 304 generates the control and timing signals for the computer system RAM 106 
which may be synchronous dynamic RAM and the like. The memory interface and control 304 has an arbiter (not 
illustrated) which selects among memory accesses for CPU writes, CPU reads. PCi writes, PCI reads, AGP reads, 
AGP writes, and dynamic memory refresh. Arbitration may be pipelined into a current memoiy cycle, which ensures 
that the next memory address is available on the memory bus 105 before the current memory cycle is complete. This 

20 results in minimum delay, if any, between memory cycles. The memory interface and control 304 also is capable of 
reading ahead on PCI master reads when a PCI master issues a read multiple command, as more fully described in 
the PCI Specification. 

[0047] The host/PCI bridge 306 controls the interface to the PCI bus 1 09. When the CPU 1 02 accesses the PCI bus 
109, the host/PCI brrdge 306 operates as a PCI master. When a PCI device is a master on the PCI bus 109, the host/ 
25 PCI bridge 306 operates as a PCI slave. The host/PCI bridge 306 contains base address registers for PCI device 
targets on its PCI bus 109 (not illustrated). 

[0048] The AGP data and control 310, AGP arbiter 316, and AGP request/reply queues 312 interface to the AGP 
bus 207 and also have signal power and ground connections (not illustrated) for implementation of signals defined in 
the AGP and PCI Specifications The AGP bus 207 is adapted to connect to an AGP device(s) and/or an AGP connectbr 
30 (s) (not illustrated) The GART cache 322 is used to store GART table entries for reordering and retrieving random 
non-contiguous AGP pages 412 (Figure 4A) in the computer system memory 106 to contiguous AGP device address 
space 406 for use by the graphics controller 210. 

[0049] The PCI/PCI bridge 320 is connected between the PCi bus 109 and the AGP bus 207. The PCiyPCI bridge 
320 allows existing enumeration code in the computer system BIOS 140 to recognize and handle AGP compliant 

35 devices, such as the video graphics controller 210, residing on the AGP bus 207. The PCI/PCI bridge 320, for example, 
may be used in configuring the control and status registers of the AGP graphics controller 210 or the AGP logic 318 
by bus enumeration during POST both being connected to the AGP bus 207, as more fully described hereinbelow. 
[0050] Referring now to Figures 4A-4C (also see Figure 13). schematic diagrams of a computer system memory 
map, a GART table in the computer system memory and a GART table entry are illustrated. A logical memory map of 

40 the computer systen? memory 106 is generally indicated by the numeral 402, the graphics controller physical address 
space by the numeral 404, and the AGP device address space (virtual memory) by tfie numeral 406. The computer 
system 200 may address up to 4 gigabytes ("GB") of memory witJi a 32 bit address, however, some of this 4 GB of 
memory address space may be used for local memory associated with various devices such as the AGP video graphics 
controller's 210 memory which may include the local frame buffer 208, texture cache, alpha buffers, Z-buffers, etc., all 

45 being addressed within the graphics controller physical address space 404. In addition, according to the present in- 
vention, some of the memory address space 402 is used for the AGP device address space 406. In Figure 4A, the 
bottom (lowest address) of the computer system memory 106 is represented by the numeral 408 and the top (highest 
address) is represented by the numeral 410. In between the bottom 408 and the top 410 are various blocks or "pages" 
of AGP memory represented by the numeral 412. Each page 412 has a contiguous set of memory addresses. 

50 [0051] In the present invention, some of these AGP memory pages (indicated by 41 2a, 41 2b and 41 2c) are used to 
store AGP information, such as textures, lists and the like, and at least one page (indicated by 414) is used to store 
entries in the GART table 414. The GART table 414 comprises a plurality of entries 418 (Figure 4B). Enough GART 
table entries 41 8 are stored to represent all of the associated AGP device address space 406 being used in the computer 
system 200. Each GART table entry 418 represents the base address 416 of the respective page 412 of the AGP 

55 memory. Another memory page may also be used to store a GART directory (not illustrated).' The GART directory is 
used for two-level address remapping as more fully described hereinbelow. Each GART table entry 418 stores 32 
binary bits of information (Figure 4C). The GART table 414 is used to remap AGP device address space 406 to ad- 
dresses of the pages 412. by using the upper bits (31:12) to store a base address 416 for each of the corresponding 
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4 KB pages 412. The lower 12 bits of the AGP device address 406 is the same as the lower 12 bits of the address of 
the page 41 2, as more fully described hereinbelow. See also Figures 1 1 A and 1 2A and the specification relating thereto. 
Thus the lower 1 2 bits (11:0). when using a 4 KB size page 41 2 addressed by each GART table entry 418, are free for 
other uses besides addressing AGP texture data. For other memory page sizes, different numbers of bits are available 

5 in the GART table entry 418 for the other uses and are contemplated herein. 

[0052] Each GART table entry 41 8 may comprise four eight bit bytes for a total of 32 bits of binary information. If the 
twenty most significant bits 426 (31 :1 2) (Figure 4C) in the GART table entry 418 are used for the base address 416 of 
the corresponding 4 KB page 41 2, the twelve least significant bits (11 :0) are available for use by the systems designer 
in defining and/or customizing certain features and attributes associated with the memory page 412. These least sig- 

10 nificant bits are hereinafter referred to as "feature bits" or "feature flags;" 

[0053] The video graphics controller 210 asserts addresses on the AGP bus 207 requesting the required graphical 
texture data. The AGP logic 318 receives these addresses for the requested graphical texture data which reference 
the AGP device addresses 406. however, the AGP device addresses 406 are virtual addresses and do not physically 
exist in the computer system. The AGP logic 318 therefore must remap these AGP device addresses 406 into the 

IS actual AGP pages 412 residing in the memory 106. These AGP pages 412 are not contiguous nor are they in any 
particular order. The GART table is used to remap the AGP device addresses 406 to the actual physical addresses of 
the AGP pages 41 2 residing in the physical memory 1 06 (logical memory map 402) as more fully deschbe herein above 
and below. The core logic caches a subset of the most recently used GART table entries 418 to Increase AGP per- 
formance when performing the address translation. AGP address translation speed is improved whenever a read to 

20 the memory 106 is not needed to obtain a selected GART table entry 418, i.e., there is a GART cache 322 hit. 

[0054] Referring now to Figure 5, a schematic functional block diagram and memory map of the present invention 
is illustrated. When the video graphics controller 210 requests graphics texture data on the AGP bus 207, the AGP 
logic 31 8 evaluates the asserted AGP device address space 406a to determine if the associated GART table entries 
41 8a are in the cache 322. If the GART table entries 41 8a are in the cache 322 (a cache hit) the AGP logic 318 perfomis 

25 a memory read of the AGP pages 41 2 located in the physical memory 402 and remaps the pages 412 to the desired 
AGP device address space 406a, as more fully described below. However, if the necessary GART table entries 418 
(Figure 4A) are not found in the cache 322, then the AGP logic 31 8 must first update the cache 322 with the necessary 
GART table entries 418. 

[0055] Figure 5 illustrates four GART table entries for illustrative clarity, however, any number of GART table entries 
30 may be cached in the core logic chipset 204 of the present invention and are contemplated herein. The GART table 
entries 418 are read from the GART table 414 located in the physical memory 106. Once the selected GART table 
entries 418a are written into the cache 322. the AGP pages 412 may be read from the physical memory 106. The AGP 
pages 412 are not stored in the AGP logic 318 but are used by the video graphics controller 210 directly from the 
memory 1 06. The AG P logic 318 acts as an address translator to remap the random ordered and noncontiguous AGP 
35 pages 412 into a contiguous AGP device address space 405. One-level GART and two-level GART translations, ac- 
cording to the present invention, are nrxire fully described hereinbelow. 

[0056] In an embodiment of the present invention^ the AGP logic 31 8 utilizes the cache memory 322 to store selected 
ones of the GART table entries 418a if a single-level address remapping is implemented, and, in addition, selected 
ones of a GART directory if a two-level GART address remapping is implemented (not illustrated). Each cache location 

40 stores a selected one 418a from the GART table 41 4 and is referenced to a page boundary of a linear address 406a 
of the AGP device address space 406. The selected one 418a of the GART table 414 stored in the cache 322 has the 
base address 41 6 of the page 41 2 of graphics data stored in physical memory 1 06 and flag bits described hereinabove. 
Register logic is also provided in the AGP logic 318 to associate a GART Cache- Entry Update bit 502 and a GART 
Cache Entry Invalidate bit 504 with each of the cached GART table entries 418a. 

45 [0057] AGP memory-mapped control registers (Figure 1 9A) of the core logic chipset 204 are accessed via a base 
address register BAR 1 1704 (Figures 17Aand 17C) residing in a host -to-PCI bridge configuration header (Figure 17A). 
A base address is determined and written to the base address register BARI 1704 by the system BIOS during POST 
The AGP memory-mapped control registers (Figure 19A) are used by the GART miniport driver to dynamically control 
AGP functionality within the core logic chipset 204 during operation of the computer system 200. Preferably a GART 

50 Cache Entry Control Register 1916 (Figures 19A and 191) is used by software, such as the GAFTT miniport driver, to 
update/invalidate a specific GART cache entry 41 8a. When the GART miniport driver receives a call to update/invalidate 
entries in the GART table (located in the system physical memory), it is also required to maintain coherency of the 
GART table entries 418a in the cache 322. If the particular update/invalidate GART table entry is not present in the 
GART cache 322, the invalidate function will have no effect. If a cached GART table entry 418a needs to be updated, 

55 the AGP logic 318 may prefetch the new GART table entry from system physical memory 106 and thus further speed 
address translations. 

[0058] The GART Cache Entry Control Register 1916 may be 32 bits wide (double word) and comprises a GART 
Entry Offset 1950 (Figure 191) having a plurality of bits (31:12) which define the AGP device address 406a of the 
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particular GART table entry 418a to be invalidatedyupdated. a GART Cache Entry Update 1952 (bit number 1) and a 
GART Cache Entry Invalidate 1954 (bit number 0). The GART miniporl driver derives this device address from the 
linear address (Lin-to-Dev command). When a device address 406a is written to the GART Cache Entry Control Reg- 
ister 1916 by the GART miniport driver, the AGP logic 316 invalidates/updates the referenced cache entry based upon 

5 the appropriate setting in the GART Cache Entry Update 1 952 and/or GART Cache Entry Invalidate 1954 as folbws: 
When the GART Cache Entry Update 1 952 is set to a logic 1 , the AGP logic 31 8 updates the cached GART table entry 
4iea referenced by the GART Entry Offset 1950 bits 31:12 with the current entry 418 in the GART table 414 in the 
system memory 106. The update function is performed following the write to this register When the update operation 
IS completed, the AGP logic 318 may reset the GART Cache Entry Update 1952 to 0. The GART nr^iniport driver may 

10 poll ine GART Cache Entry Update 1952 to verify completion of the update operation. When the GART Cache Entry 
Invalidate 1954 is set to a logic 1. the AGP logic 31 8 invalidates the cached GART table entry 418a referenced by the 
GART Entry Offset bits 31:12, if present in the GART entry cache 322. The invalidate function may be performed 
immediately following the write to this register. When the invalidate operation is completed, the AGP logic 318 may 
reset the GART Cache Entry Invalidate 1954 to 0. The GART miniport driver may poll the GART Cache Entry Invalidate 

'5 1954 to verily completion of the invalidate operation. 

10059J The AGP logic 318 may also comprise additional bits associated with the GART' Cache Entry Update 1952 
and the GART Cache Entry Invalidate 1954 of the GART Cache Entry Control Register 1916 tor each cache memory 
322 location. Referring to Figure 5, a GART Cache Entry Update bit 502 and a GART Cache Entry Invalidate bit 504 
arc used to store the logic level values from the GART Cache Entry Update 1952 and the GART Cache Entry Invalidate 

20 1 954 respectively which are associated with the device address 406a for each of the GART table entries 41 8a in the 
cache memory 322. Bits 502 and 504 allow the AGP logic 318 to automatically invalidate an invalid GART table entry 
(tes) 4l8a (those with a bit 504 set to logic "1 or prefetch into the cache 322 the GART table entry(ies) 418a marked 
to be updated (those with a bit 502 set to logic "V). Bits 502 and 504 may also be reset by the AGP logic after the 
indicated operation has been performed. 

2S 

AGP Specification 

[0060] The Intel AGP Specification entitled "Accelerated Graphics Port Interlace Specification Revision 1.0," dated 
July 31 1996 incorporated by reference hereinabove, provides signal, protocol, electrical, and mechanical specifica- 

30 tions for the AGP bus. However, further design must be implemented before a fully function computer system with AGP 
capabilities is realized. The following disclosure defines the implementation specific parts of an AGP interface according 
to the present invention. The following disclosure includes the GART table, buffer depths, latencies, registers, and 
driver functionality and interfaces so that one of ordinary skill in the art may practice the present invention without 
undue experimentation when used with the aforementioned Intel AGP Specification incorporated by reference herein. 

3S [0061] Moving textures and other information required by the graphics controller, such as command lists, out of the 
local frame buffer into system memory creates a problem: the presently implemented prior an computer system archi- 
tecture, illustrated in Figure 1, cannot support the bandwidth requirements of tomorrow's 3-D graphics enhanced ap- 
plications. The standard PCI bus 109 (33 fvlHz. 32 bit) bandwidth is 132 MB/s peak and 50 I^B/s typical. Microsoft 
Corporation estimates that future graphics applications will require in excess of 200 MB/s. This means that the PCI 

40 bus 109 in the compiuter system architecture illustrated in Figure 1 will likely starve the graphics controller 1*10 as well 
as other PCI devices (122, 124. 114, 116 and 118) also trying to access the PCI bus 109. 

AGP Architecture 

-iS [0062] To remedy this situation, Intel developed the AGP architecture illustrated in Figures 2 and 3. tn the Intel AGP 
architecture, a graphics controller 210 is removed from the existing PCI bus 109 and placed on a higher bandwidth 
AGP bus 207. This AGP bus 207 has a peak bandwidth of 532 megabytes per second ("MB/s'). The bandwidth bot- 
tleneck now exists in the core logic chipset 204 and the memory bus 105. which have lo handle requests from the host 
bus 103, the PCI bus 109, and the AGP bus 207 (Figure 2), as well as memory 106 refreshing by the memory interface 

50 and control 304. However, with the introduction of faster memory 106 and highly integrated, faster chipsets, this problem 
becomes manageable. 

[0063] Understanding the necessity for the Graphics Address Remapping Table ("GART table") requires a full un- 
derstanding of the AGP addressing scheme. Referring now to Figures lOAand 10B, schematic memory maps of an 
AGP single-level address translation and an AGP two-level address translation, respectively are illustrated. In the prior 
55 an computer system architecture illustrated in Figure 1 . the graphics controller's physical address space resides above 
the top 410 of system memory 106. The graphics controller 110 used this physical address space for the local frame 
buffer 108. texture cache, alpha buffers, 2-buffers. etc. In the AGP system, information still resides in the graphics 
controller memory (alpha, z-buffer, local frame buffer 108. etc.). but some data which previously resided in the prior 
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an local frame buffer 1 08 is moved to system memory 1 06 (primarily textures, but also command lists, etc. ). The address 
space employed by the graphics controller 210 to access these textures becomes virtual, meaning that the physical 
memory corresponding to this address space doesn't actually exist above the top of memoiy. in reality, each of these 
virtual addresses correspcnd to a physical address in the system memory 106. The graphics controller 210 addresses 
s this virtual address space, referenced hereinabove and hereinafter as "AGP device address space" as one contiguous 
block of memory 406, but the corresponding physical addresses are allocated in 4 KB, non-contiguous pages 412 
throughout the computer system memory 106. 

[00S4] A system, method and apparatus is needed to remap the graphics controller's contiguous, AGP device ad- 
dresses into their corresponding physical addresses that reside in the system memory 106. This is the function of the 
10 GART table. The GART table resides in the physical memory 106 (Figure 1), and is used by the core logic chipset 204 
to remap AGP device addresses that can originate from either the AGP bus 207, host bus 103. or PCI bus(es) 109. 
The GART table is managed by a GART miniport driver. In the present invention, the GART table implementation 
supports two options for remapping AGP addresses: single-level address translation and two-level address translation. 

IS Single-Level GART Table Translation 

[0065] A single-level address translation may improve overall AGP performance by reducing the number of GART 
table entry lookups required by the chipset. Single-level means that the chipset need only perform one GART table 
lookup to get the physical address of the desired page (table-> page). This is possible because the GART table is 
20 allocated by the operating system into one single, contiguous block of uncachable memory. Allocation of this memory 
is typically performed early in the initialization process to ensure that contiguous memory is available. However, de- 
fragmentation of the computer system memory to obtain the necessary contiguous memory space at any time during 
operation of the computer system is contemplated herein. 

[0066] In a computer system using single-level address translation, the AGP device addresses used by the graphics 
25 controller can be viewed as consisting of three parts as illustrated in Figure 11 A: the base address of device address 
space (bits 31 :x), the page offset into AGP device address space (bits x:12), and the offset into the 4 KB page (bits 
11:0). Note that the page offset into AGP device address space can also be used as an entry index into the GART 
table. Also note that the number of bits comprising the page offset into AGP device address space depends upon the 
size of virtual (and physical) memory allocated to AGP For instance, it takes 13 bits to represent al! of the pages in a 
30 system with 32 I^B of AGP memory. The table of Figure 1 1 B illustrates the number of bits required to represent each 
4 KB page in AGP memory versus the size of the AGP memory. 

[0067] System memory requires an address with the format illustrated in Figure 11 A. This address consists of the 
base address of the 4 KB page (bits 31 :1 2) and the offset into the 4 KB page (bits 11 :0). The base address of each 4 
KB page is information required by the GART table to remap corresponding device addresses. The offset into the 4 

35 KB page is the same offset that exists in the AGP device address. 

[0068] Referring now to Figure 11C, a schematic flow diagram for converting device addresses into physical ad- 
dresses in a single-level address translation is illustrated. The base address of AGP device address space, along with 
the size of AGP memory can optionally be used by the chipset to determine if the address in the request falls within 
AGP device address space before remapping occurs. To remap the address, the page offset from the AGP base 

40 address is multiplied by the size of a single GART table entry (4) and added to the base address of the GART table. 
This provides the physical address of the required GART table entry. This entry is retrieved from the GART table, which 
resides in system memory. Within this GART table entry is the base address of the desired 4 KB page; a page which 
resides somewhere in system memory. Adding the offset into the 4 KB page to this base address yields the required 
physical address. Note that the offset into the 4 KB page in virtual AGP memory (bits 11 :0) is equivalent to the offset 

45 into the 4 KB page in physical (system) memory. 

Two-Level GART Table Translation 

[0069] TwO"level address translation requires two GART table lookups to remap an AGP device address to a physical 
50 address in memory (directory page -> table). The first lookup reads the GART directory entry from system memory. 
The GART directory entry contains the physical address of a corresponding page of GART table entries, also residing 
in physical memory. A second lookup is required to retrieve the appropriate GART table entry which then points to the 
base address of the desired 4 KB page of AGP data in the computer system physical memory. 
[0070] In some designs, two-level address translation may be preferred over the single-level address translation 
55 because it is not necessary for the GART directory and 4 KB pages comprising the GAFTT table to be contiguous. The 
operating system may be more likely to successfully allocate physical memory for the GART table using two-level 
address translation since a large block of contiguous memory is not needed. Dynamic allocation of the GART table 
using either single-level or two-level address translation is contemplated in the present invention. 
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[0071] In a system using two-level address translation, the device addresses used by the graphics controller can be 
viewed as consisting of four parts as illustrated in Figure 12A: the base address of AGP device address space (bits 
31 :x). the directory offset into AGP device address space (bits x:22), the page offset into a table entry (bits 21 ; 1 2). and 
the offset into the 4 KB page (bits 11 :0). Note that the number of bits comprising the directory offset into AGP device 
5 address space depends upon the size of AGP device address space. For instance, it takes 6 bits to represent all of 
the GART directory entries (64) in a system with 256 f^B of AGP memory. Since each G ART directory entry corresponds 
to 4 MB of address space (i.e. 1 024 pages), each page offset can be addressed using 10 bits. The table of Figure 1 28 
illustrates the number of bits required to represent the GART directory and page in AGP memory versus the size of 
AGP mennory. 

10 [0072] Referring now to Figure 12C. a schematic flow diagram for converting device addresses into physical ad- 
dresses in a two-level address translation is illustrated. The base address of AGP device address space (bits 31 :x), 
along with the size of AGP memory can optionally be used by the chipset 204 to determine if the address in the request 
falls within AGP device address space before remapping occurs. To remap the address, the directory offset (bits x:22) 
is multiplied by the size of a single GART directory entry (4 bytes) and added to the base address of the GART directory 

IS (a.k.a. - base address of 4KB page containing the directory). This provides the physical address of the required GART 
directory entry. The GART directory entry is retrieved from physical memory, and within this GART directory entry is 
the physical address to the base of the 4KB page holding the GART table entry corresponding to the request. To get 
the GART table entry, the page offset (bits 21:12) is multiplied by the size of a single GART table entry (4 bytes) and 
added to the base address of the retrieved page of the GART table. This GART table entry is then fetched from memory, 

20 and within this GART table entry is the base address of the desired 4K8 page of AGP graphics data, The AGP graphics 
data page resides in system memory. Adding the offset into the AGP data 4 KB page (bits 11 :0) base address yields 
the required physical address. Note that the offset into the 4 KB page in AGP device address space (bits 11:0) is 
equivalent to the offset into the AGP data 4 KB page in physical (system) memory. 

[0073] In a two-level address translation, both a GART table and a GART directory are required. In a single-level 
25 address translation, only the GART table is necessary. The format for the GART table and use thereof are identical for 
both the single and the two-level acidress translations. 

GART Table 

30 [0074] Referring now to Figure 1 3. a schematic memory map of a GART table is illustrated. Each entry in the GART 
table is four bytes long and may comprise the following information: page base address (bits 31:12). dirty bit (bit 2), 
link bit (bit 1), and valid bit (bit 0). The page base address (bits 31:12) specifies the physical address of the first byte 
of the corresponding 4 KB page in physical memory. The bits in this field (bits 31:12) are interpreted as the twenty 
most significant bits of the physical address and align the associated page on a 4 KB boundary. The page base address 

35 is initialized and managed by the GART miniport driver. 

[0075] Bits 11:0 may be used as flag bits to customize and characterize each associated page. The present invention 
allows future enhancements to the AGP Specification by utilizing these flag bits. For example, a cacheability flag bit 
may indicate whether the 4 KB page is cacheable. and a write combinable bit may indicate whether the 4 KB page is 
write combinable. More specific examples of the present invention are as follows: 

40 [0076] Bit 2 may be used as a dirty bit. The dirty bit may indicate when the page referenced by this GART table entry 
has been modified. 

[0077] Bit 1 may be used as a link bit. The link bit may be set and managed by the GART miniport driver. It indicates 
that the next GART table entry is associated with the current GART table entry. The link bit can be used by the chipset 
when prefetching GART table entries as part of a GART table lookup. If the link bit is set in the first GART table entry, 

45 the chipset may cache the second entry. If the link bit in the second entry is set, then the third entry may get cached. 
This may continue until the link bit is not set in one of the entries and can be utilized when doing a normal cache read 
so that no more than the necessary number of GART table entries are cached, i.e., a full cache line read is not needed. 
' The link bit is also useful when textures overlap into contiguous 4 KB pages within AGP device address space. 
[0078] Bit 0 may be used as a present flag. This present flag indicates whether the AGP data page being pointed to 

so by the GART table entry has been reserved by the GART miniport driver When the present flag is set, the AGP data 
page has been reserved in physical memory and address translation may be carried out. When the present flag is 
clear the AGP data page has not been reserved in memory and the chipset must determine whether to perform the 
translation or generate an error (SERR#). The pr^esent flag does not necessarily indicate whether the entry actually 
maps to an AGP data page, but that the GART table entry has been reserved for an application by the GART miniport 

ss driver. 
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GART Directory 



[0079] Referring now to Figure 1 4, a schematic memory map of entries in a GART «: lory, a page of GART table 
entries, and the AGP memory is illustrated. The GART directory may be contained a single 4KB page residing in 

5 uncacheable physical memory. Since each GART directory entry may be 4 bytes long, 1024 GART directory entries 
can exist within a single 4 KB page in the computer system memory. Thus, up to 4 GB of physical memory may be 
addressed with a single 4 KB page size for the GART directory. To support up to 2 GB of AGP device address space 
only 512 GART directory entries are required. Bits 31:0 contain the base address (offset = 0) of the GART directory 
entry's corresponding page of the GART table, which may also reside in physical memory. 

w [0080] GART table size is a function of the amount of AGP memory required by the system. In a system using a 
single-level address translation, size is computed using the following equation: 

GART S.e (Bytes) * 3ART Entry Size (4 Bytes) 

J5 

Where 



AGP Memory Required = The amount of system memory dedicated to AGP 
Page Stze = Standard page size in system memory 
20 GART Entry Size = The size of a single entry in the GART table 



[0081] Note that this equation computes maximum GART table size based upon the amount of AGP device address 
space reserved The amount of actual GART table memory reserved may depend upon the operating system. 
[0082] In a two-level address translation, an additional 4 KB page (4096 bytes) is required for the GART directory. 
25 In a system using the two-level address translation, size is computed using the following equation: 



GART Size (Bytes) = AGP^gpgquired . q^^,^ gize + Page Size 

30 [0083] Referring to Figure 15, a table showing the correlation between allocated AGP memory and the maximum 
size of the GART table is illustrated. For clarity only, implementations of GART tables based upon AGP memory re- 
quirements of 32 MB, 64 MB, 128 MB, 256 MB, 512 MB. 1 GB, and 2 GB are illustrated, however, any AGP memory 
size may be accommodated and is contemplated to be within the scope of the present invention. Note that the two- 
level translation requires one additional 4 KB page for its directory. 

35 

■ AGP Logical Architecture 

[0084] Referring now to Figure 16, a functional block diagram of the AGP chipset 204 according to the present 
invention is illustrated. The AGP chipset 204 performs two main functions: Host to PCI Bridge functions (function 0) 

40 and PCI to PCI bridge functions (function 1).The Host to PCI bridge is the standard interface generally found ina PCI- 
based core logic. The PCI to PCI bridge is used to facilitate the configuration of the AGP port without changing existing 
bus enumeration code. Each of these functions has its own configuration registers, which reside in its own PCI con- 
figuration header type as required by the PCI 2. 1 Specification. These configuration registers are listed in Figures 1 7A 
and 18A, respectively and more detailed register bit information for the AGP specific registers are listed in Figures 

45 1 76, 1 7C and 1 8B-1 8M. Note that the AGP chipset implements the New Capabilities mechanism as more fully described 
in the Engineering Change Notice ("ECN") entitled 'Addition of *New Capabilities' Structure," dated May 20. 1 996, and 
is herein incorporated by reference. The New Capabilities structure is implemented as a linked list of registers containing 
information for each function supported by the device. The AGP registers are included in the linked list. 
[0085] The PCI-PCl bridge 320 function need not be a fully functional PCI-PCI bridge. It need only allow memory 

so write transactions that originate on the PCI bus 1 09 to be fonwarded to the AGP bus 207. It does not have to do AGP 
to PCI memory write transactions. Nor does it have to do other PCt commands such as, for example, I/O (read and 
write), configuration (road and write), and memory read (memory read, memory read line, memory read multiple), 
special cycles and interrupt acknowledge to cross the interface. These limitations only apply to the PCI-AGP and AGP- 
PCI interface. All Host to AGP and Host to PCI commands are supported by the present invention. 

55 [0086] AGP compliant masters have certain memory requirements that must be placed in the system memory map 
using the Memory Base, Memory Limit, Prefetchable Memory Base, and Prefetchable Memory Limit registers found 
at offsets 20h, 22h, 24h. and 26h respectively Host-to-PCI (Function 0) and PCI-to-PCI (Function 1) device ID's also 
may be different to accommodate Microsoft's policy regarding device drivers for multifunction devices. The following 
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set of registers, described below, p.eferably are registers that may be required to implement an AGP compliant core 
logic chipset according to the present invention. 

Host to PCI Bridge 

5 

[0087] Referring to Figure 17A, a schematic table of registers for the host to PCI bridge 306 function, according to 
an embodiment of the present invention, is illustrated. A Base Address Register 0 (BARO) 1702 is used by system 
BIOS memory mapping software to allocate AGP device address space for the AGP compliant master Figure 178 
illustrates the functional description of the bits used in this register System BIOS determines the size and type of 

TO address space required for AGP implementation by writing all ones to BARO 1702 and then reading from the register 
By scanning the returned value from the least-significant bit of BARO 1702 upwards, BIOS can determine the size of 
the required address space. The binary-weighted value of the first one bit found indicates the required arrtount of space. 
Once the memory has been allocated by BIOS, the base address of the AGP device address space is placed in bits 
31:4 of this register This register also contains information hard-wired to indicate that this is prefetchable memory 

75 range that can be located anywhere in 32-bit address space. Any other means for determining the required AGP device 
address space may also be used and Is contemplated herein. 

[0088] Accesses to a PCI device's configuration space are relatively stow. In the Intel x86 based computer systems, 
one PCI register access requires two I/O cycles: one to the PCI address register (address CFSh) and the other to the 
PCI data register (address CFCh). Processor related t/O cycles are also slower than memory cycles. Therefore, in the 

20 present invention, a Base Address Register 1 (BAR1 ) 1 704 may be used by the G ART miniporl driver to access memory- 
mapped AGP control registers. Figure 17C Illustrates the functional description of the bits used in this register System 
BIOS determines the size and type of address space required by the AGP memory-mapped control registers by writing 
all ones to BARI 1704 and then reading from the register By scanning the returned value from the least-significant bit 
of BARI 1704 upwards. BIOS can determine the size of the required memory address space. The binary-weighted 

2S value of the first one bit found Indicates the required amount of space. Once the memory has been allocated by BIOS, 
the base address of the AGP memory address space Is placed in bits 31 :4 of this register This register also contains 
Information hard-wired to Indicate that this Is non-prefetchable memory range that can be located anywhere in 32 -bit 
address space. Any other means for determining the required memory address space may also be used and is con- 
templated herein. 

30 

PCI to PC! Bridge 

[0089] Referring to Figure ISA. a schematic table of registers for the PCI to PCI bridge 320 (function 1 ), according 
, to an embodiment of the present invention, is illustrated. A Command Register 1806 provides coarse control over the 
35 PCI-to-PCI bridge 320 function within the core logic chipset 204. This register controls the ability to generate and 
respond to PCI cycles on both the AGP bus 207 and PCI bus 109. Figure 188 illustrates the functional description of 
the bits used in the Command Register 1806. 

[0090] A Status Register 1808 provides course status of the PCI-to-PCI bridge 320 function within the core logic 
chipset 204. Figure 18C illustrates the functional description of the bits used in the status register 1808. The Status 

40 Register 1 808 is included In this specification to emphasis that the Capabilities Pointer Supported bit (bit 4) should be 
set to 11 in a host bridge compliant with implementation of the present invention. When a status bit Is set, the bit is 
cleared using standard procedures as specified by the PCI Specification (I.e. - write a "1 " to the bit). 
[0091] A Secondary Status Register 1810 is similar in function and bit definition' to the status register 1808 (Offset 
06h) however its bits reflect status conditions of the secondary side ot the PCI-to-PCI bridge 320 Interface connected 

45 to the AGP bus 207. Figure 18D illustrates the functional description ot the bits used In the Secondary Status Register 
1 81 0. Aside from the redefinition ot bit 1 4 as defined In the PCI-to-PCI bridge specification, the 66fy/lhz capable bit (bit 
5) has been redefined for AGP. When a status bit is set, the bit Is cleared using standard procedures as specified by 
the PCI Specification (I.e. - write a "1 " to the bit). 

[0092] A Memory Base Register 181 2 Is used by the computer system BIOS memory mapping software to store the 
so base address of the non-prefetchable address range used by the AGP master (graphics* controller). Figure 18E illus- 
trates the functional description of the bits used in the Memory Base Register 1812. System BIOS bus enumeration 
software allocates a block of physical memory above the top of memory (TOM) based upon the requirements found 
In the AGP master's base address register (BAR). The BIOS places the base address of the block of memory in this 
register It also places the address of the top of the address range in a Memory Limit Register 1814. Given this infor- 
55 mation, the core logic chipset 204 can use these two addresses to decode cycles to the AGP master's non-prefetchable 
memory space. This non-prefetchable memory Is where the master's control registers and FIFO-like communication 
interfaces are mapped. The memory address range may reside on 1 MB boundaries. 

[0093] The Memory Limit Register 1 814 is used by the computer system BIOS memory mapping software to store 
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the top address of the non-prefetchable address range used by the AGP master (graphics controller). Figure 18F 
illustrates the functional description of the bits used in the Memory Limit Register 1814. System BIOS bus enumeration 
software allocates a block of physical memory above the top of memory (TOM) based upon the requirements found 
in the nnaster's base address register (BAR). BIOS places the top address of the block of memory in this register It 

5 also places the address of the base of the address range In the fvlemory Base Register 181 2. 

[0094] A Pref etchable Memory Base Register 1 81 6 is used by the computer system BIOS memory mapping software 
to store the base address of the prefetchable address range used by the AGP master graphics controller). Figure 1 8G 
illustrates the functional description of the bits used in the Prefetchable Memory Base Register 1816. System BIOS 
bus enumeration software may allocate a block of memory addresses above the top of memory (TOM) based upon 

10 the requirements found in a master's base address register (BAR), or may use a look-up table to determined the block 
of memory addresses based upon the type of AGP device indicated in its configuration registers (see Figure 22A). 
BIOS places the base address of the block of memory in the Prefetchable Memory Base Register 1816. It also places 
the address of the top of the address range in a Prefetchable Memory Limit Register 1818. Given this information, the 
core logic chipset 204 can use these two addresses to decode cycles to the AGP master's prefetchable memory space. 

15 This prefetchable memory is where the graphics controller's Local Frame Buffer 208 is mapped. The memory address 
range may reside on 1 MB boundaries. 

[0095] The Prefetchable Memory Limit Register 1816 is used by the computer system BIOS memory mapping soft- 
ware to store the top address of the prefetchable address range used by the AGP master (graphics controller). Figure 
1 8H illustrates the functional description of the bits used in the Prefetchable Memory Limit Register 1818. System BIOS 

20 bus enumeration software allocates a block of memory addresses above the lop of memory (TOM) based upon the 
requirements found in the AGP master's base address register (BAR), or may use a look-up table to determined the 
block of memory addresses based upon the type of AGP device indicated in its configuration registers (see Figure 
22A). BIOS places the top address of the block of memory in this register. It also places the address of the base of the 
address range in the Prefetchable Memory Base Register 1816. Given this information, the core logic chipset 204 can 

25 use these two addresses to decode cycles to the AGP master's prefetchable memory space. This prefetchable memory 
is where the graphics controller's Local Frame Buffer is mapped. The memory address range may reside on 1 MB 
boundaries. 

[009S] A Capabilities Pointer Register 1820 provides an offset pointer to the first function supported by this device, 
in accordance with the New Capabilities mechanism as described by PCI 2.1 Specification (reference: ECN defining 
30 "New Capabilities"). Figure 1 81 illustrates the functional description of the bits used in the Capabilities Pointer Register 
1820. AGP is a function supported by the New Capabilities ECN Specification. 

[0097] An AGP Capability Identifier Register 1 822 identifies this function in the capabilities list to be the AGP function. 
Figure 18J illustrates the functional description of the bits used in the AGP Capabilities Capability Register 1822. It 
also provides a pointer to the next function in the capabilities list and cites the AGP Specification revision number 

35 conformed to by the AGP device. 

[0098] An AGP Status Register 1 824 provides status of AGP functionality for the AGP device. Figure 18K illustrates 
the functional description of the bits used in the AGP Status Register 1824. Information reported includes maximum 
request queue depth, sideband addressing capabilities, and transfer rates. The AGP Status Register 1824 is a read 
only register Writes have no affect and reserved or unimplemented fields return zero when read. 

40 [0099] An AGP Command Register 1826 allows software to enable sideband addressing, enable AGP, and set the 
AGP transfer rate. Frgure 18L illustrates the functional description of the bits used in the AGP Command Register 1826. 
[0100] An AGP Device Address Space Size Register 1828 determines the size of AGP Device Address Space to be 
allocated by system BIOS. Figure 1 8M illustrates the functional description of the bits used in the AGP Device Address 
Space Size Register 1828. The AGP Device Address Space Size Register 1828 also may determine whether an AGP 

45 device is valid in the computer system. 

[0101] Referring now to Figure 1 9A. a schematic table of memory-mapped registers, according to an embodiment 
of the present invention, is illustrated. The chipset's memory-mapped control registers illustrated in Figure 19A are 
accessed via the address residing in BAR1 1704 (Figure 17A) in the Host to PCI bridge 306 (function 0) configuration 
header (offset 14h). This address is determined and written to the BAR1 1704 by system BIOS. The registers within 

50 this system memory 1 06 space may be used by the G ART miniport driver to control AG P functionality within the chipset 
204 during run-time. An advantage of storing information in the system memory-mapped registers is that the processor 
102 accesses these memory mapped registers with memory accesses, its fastest mechanism for data retrieval. This 
may be important for the run-time accessible registers like the cache control registers (not illustrated). 
[0102] A Revision ID Register 1 902 is provided so that the GART miniport driver can identify the format and features 

55 provided by the chipset specific AGP control registers. Figure 1 9B illustrates the functional description of the bits used 
in the Revision ID Register 1 902. 

[0103] A GART Capabilities Register 1904 defines the GART features supported by the core logic chipset. Figure 
19C illustrates the functional description of the bits used in the GART Capabilities Register 1 904. 
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[01 04] An AGP Feature Control Register 1 906 enables the GART features supported by the chipset 204. Figure 1 9D 
illustrates the functional description of the bits used in the AGP Feature Control Register 1906. 
[01 05] An AGP Feature Status Register 1 908 is used to record status information for AGP and GART related events. 
Figure 19E illustrates the functional description of the bits used in the AGP Feature Status Register 1908. A bit is reset 

5 whenever a logic "1 " is written to that bit. 

[0106] A GART Table/Directory Base Address Register 1910 provides the physical address for the GART table/ 
directory in system memory. Figure 1 9F illustrates the functional description of the bits used in the GART Table/Directory 
Base Address Register 1910. In systems using single-level address translation, this register corresponds to the base 
address of the GART table. In systems using two-level address translation, this register corresponds to the base ad- 

10 dress of the GART directory. This register is initialized by the GART miniport driver whenever memory for the GART 
table/directory is allocated. Refer to the Software Specification description hereinbelow for a more detailed description 
of GART table memory allocation. 

[0107] A GART Director^'/Table Cache Size Register 1912 identifies the maximum number of entries which can be 
cached by the core logic chipset in the GART directory and the GART table caches. Figure 1 9G illustrates the functional 

IS description of the bits used in the GART Directory/Table Cache Size Register 1912. 

[0108] A GART Directory/Table Cache Control Register 1 914 provides software with a mechanism to invalidate the 
entire GART directory and table caches, therefore maintaining coherency with the GART directory and table in system 
memory. Figure 19H illustrates the functional description of the bits used in the GART Directory/Table Cache Control 
Register 1 91 4. In systems using a single-level address translation, this register only applies to the GART table cache. 

20 In systems using two-level address translation, this register applies to both the GART directory cache and the GART 
table cache. 

[0109] A GART Table Cache Entry Control Register 1916 is used by software to update/invalidate a specific GART 
table cache entry. Figure 191 illustrates the functional description of the bits used in the GART Table Cache Entry 
Control Register 1916. When the GART miniport driver receives a call to update/invalidate entries in the GART table, 
25 it is required to maintain coherency of the GART table cache. If the updated/invalidated entry Is not present in the 
GART cache, the invalidate function will have no effect. The GART miniport driver must perform 32 bit write accesses 
to this register only. 

[0110] A Posted Write Buffer Control Register 1918 gets set by the GART miniport driver to flush the chipset's proc- 
essor to memory posted write buffers. Figure 19J illustrates the functional description of the bits used in the Posted 

30 Write Buffer Control Register 1918. This is necessary during mapping of a GART table entry. When the processor 
writes a valid entry to the GART table, the data can get placed in the chipsef s posted write buffers. If the graphics 
controller tries to access the GART table entry that is posted, the entry will not be valid and an error occurs. A similar 
problem occurs when the processor clears a GART table entry. If the data gets posted and the graphics controller tries 
to access that GART table entry, the returned data may be corrupt. 

35 [0111] An AGP Bus Utilization/Bandwidth/Latency Command Register 1920. illustrated in Figure 19K, controls the 
AGP bus utilization, bandwidth, and latency counters in the core logic chipset 204. There may be three 32-bit counters 
provided to measure the AGP bus utilization, bandwidth, and latency Each base 32-bit counter is clocked (incremented) 
using the 66 MHz AGP clock, which will count for 60 seconds. To measure utilization, bandwidth, or latency the value 
in the utilization counters after the base counter expires should be multiplied by 1 5 ns and divided by 60. The utilization, 

40 bandwidth, and latency counters can be initialized and enabled using this register A clear utilization register bit clears 
all the counters. AGP Bus Utilization, Bandwidth, and Latency Registers 1922, 1924 and 1926, respectively illustrated 
in Figures 19L-19N, are counters which may be independently started by setting the corresponding portion bits in the 
AGP Bus Utilization/Bandwidth/Latency Command Register 1920. The counting continues in the counters of the AGP 
Bus Utilization. Bandwidth, and Registers 1 922, 1 924 and 1 926, until the corresponding bits in the AGP Bus Utilization/ 

45 Bandwidth/Latency Command Register 1 920 are cleared to a logic "0". 

[0112] The AGP Bus Utilization Register 1922 holds the AGP bus utilization counter value which is incremented 
every AGP bus clock when the AGP AD[31 :0] bus is active with either one of the transactions illustrated in Figure 1 9L. 
[01 1 3] The AGP Bus Bandwidth Register 1 924 holds the AGP bus bandwidth counter value which is incremented in 
every AGP bus clock when the AGP AD[31:0] bus is active as illustrated in Figure 19M. 

so [01 1 4] The AGP Bus Latency Register 1 926 holds the AGP bus latency counter value which is incremented for every 
AGP bus clock that expires while the chipset is processing a particular AGP read request. The AGP bus latency counter 
value represents the time it takes to process an AGP transaction starting at the time the read request is enqueued and 
completing when the first quad word is data is returned to the master. Preferably, the core logic chipset 204 tracks a 
particular AGP read request starting from the time it is enqueued and ending when the first quad word of data is returned 

55 to the AGP master 
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GART Table Caching and Prefetching 

[Oil 5] Latency of AGP cycles would suffer greatly if each AGP request required a GART table/directory lookup, tn 
a system using single-level address translatbn, a GART table entry fetch from memory adds a minimum ot 16 AGP 

5 clocks (66 MHz) to an AGP request. This gets worse when the additional time required to arbitrate for the memory bus 
and time for refresh cycles is taken Into account. It is preferred to cache (save) GART table entries to prevent this 
problem and improve performance. This is illustrated in Figure 20. Likewise, it is also preferred to cache both GART 
table and GART directory entries in systems using two-level address translation. Since each GART directory entry 
covers 4 MB of address space, the GART directory cache need not be as big as the GART table cache. 

10 [0116] The need for GART caching becomes more evident when it is understood that the minimum AGP request 
size is 8 bytes of data. As a worst case, 512 AGP requests could access the same 4 KB page in physical memory. By 
fetching and caching the necessary GART table and directory entries to service the first request, the next 511 requests 
would not require a GART table or directory lookup. Thus, caching a single entry greatly improves performance. Note, 
this assumes textures reside contiguously in physical memory and span 4 KB pages. Increasing the cache size will 

15 further improve system performance. 

[0117] Graphics controllers typically will Identify four streams, at minimum, that will be accessing AGP memory via 
the GART table: CPU, video, textures, and command lists. Given this, a preferred embodiment of an AGP graphics 
controller 204 will have, at minimum, a four-way set associative GART table cache to prevent thrashing, tn systems 
with two-level address translation, the GART directory cache should preferably have at least tour entries, one for each 

20 stream. 

[01 1 8] Prefetching GART table entries also may improve performance. Prefetching occurs when the chipset retrieves 
the next GART table entry while fetching the GART table entry required to service the current AGP request. This entry 
is cached along with past GART table entries. Overhead for prefetching this extra entry is negligible considering that 
each GART table entry is 4 bytes wide while the typical memory data bus is 8 bytes wide; meaning that two GART 
25 table entries are retrieved with a single request. In addition, some chipsets burst an entire cache line (eight bytes) 
when reading data from memory. In this case seven GART table entries could easily be prefetched. Prefetching GART 
table entries is illustrated in Figure 21. 

. Core Logic Chipset Data Coherency 

30 

[0119] The core logic chipset 204 will preferably ensure that read accesses from the AGP bus 207 are coherent with 
^ write accesses from the host processor bus 103, so long as both devices are accessing AGP memory through the 
AGP device address range. For example: a read request from the AGP bus 207 will pull out the most recently written 
data from the host bus 1 03, provided both data transfers use the AGP device address space (GART table translation). 

35 The device address range should preferably be marked uncacheable in the host processor 1 02. This ensures that the 
core logic chipset 204 does not need to snoop the host processor 102 cache(s) for each AGP stream access on the 
AGP bus 207. If the host processor accesses AGP memory directly outside the virtual graphics address range, the 
host processor will most likely have this region marked as writeback cacheable, and will cache these accesses. Since 
the core logic chipset does not snoop the host processor caches for AGP stream accesses on the AGP bus, coherency 

40 problems may occur. 

[01 20] The core logic chipset 204 preferably ensures that read accesses from the host bus 1 03 and the PCI bus 1 09 
are coherent with AGP stream write accesses on the AGP bus 207 by use of the AGP Flush Command only Once an 
AGP Flush Command is retired on the AGP bus 207, alt previously retired AGP -write data will become available to 
devices on the host and PCI buses 103 and 1 09, respectively. Without the use of the AGP Flush Command, coherency 
45 problems may occur. 

AGP Graphics Controller 

[0121] In conjunction with the preferred embodiments of the present invention, an AGP graphics controller may 

so preferably be implemented in accordance with the following specification: 

[0122] Issue AGP requests on cache line boundaries to improve performance. The core logic chipset is typically 
optimized for cache iine transfers in and out of memory. If the AGP master requests read data and the transaction size 
crosses a cache line boundary, two cache line memory reads are required to fetch the data. This is inefficient; particularly 
when the master runs back-to-back cache line reads off cache line boundaries. The inefficiency due to non-cache line 

55 aligned transactions is minimized as the size of the request increases. 

[0123] AGP requests may range in size from 8 bytes to 32 quad words (QW) for reads and up to 8 QW for writes. 
This means it is impossible for the graphics controller to issue all requests on cache line boundaries. It is preferred 
that the chipset perform combined reordering of reads to minimize the performance impact of requests less than 4 QW 
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in size. 

[0124] Issue cache line or multiple cache line sized AGP requests to improve perfornnance. The core logic chipset 
is typically optimized for 32 byte (cache line) accesses to main system memory. Whenever possible, an AGP compliant 
master preferably may perform 32 byte address aligned accesses with data transfer lengths, which are muftiples of 32 

5 bytes. This may maximize bandwidth between main system memory and the AGP bus. 

[0125] Use SBA request queuing mechanism instead of PIPE. A preferred host bridge AGP target request queue is 
capable of holding multiple requests. In order to maximize target efficiency the request queue should preferably be 
kept as full as possible. This is preferably accomplished using sideband request enqueueing in order to take advantage 
of the speed of the AGP 2X mode and also to avoid AD bus arbitration overhead. 

10 [0126] If the graphics controller 210 can generate PCI cycles, implement the PCI Cache Line Size register (config- 
uration space offset OCh) and use the most efficient PCI write and read commands possible. Pentium and Pentium 
Pro systems use a cache line size of 32 bytes, so preferably at least this size should be supported. 
[0127] The Memory Write and Invalidate (MWI) command helps write burst performance, especially on Pentium Pro- 
based systems where the CPU cache snoop overhead is high. It albws the host bridge to ignore CPU cache writeback 

T5 data; once the CPU recognizes the snoop address, the host bridge can write data from the PCI stream into memory. 
This command is preferred so as to burst multiple cache lines without disconnects. 

[01 28] The Memory Read Line (MRL) and Ivlemory Read Multiple (MRM) commands cause the fiosl bridge to prefetch 
additional cache lines from memory. This speeds up read bursts, allowing bursts to continue without disconnects in a 
larger number of situations. Without these commands, CPU cache snoops hold up bursts. Prefetching hides the snoop 

20 lime during the previous cache line data transfers. 

[0129] Referring now to Figure 22A. a schenr^tic table of the AGP graphics controller 210 configuration registers, 
according to an embodiment of the present invention, is illustrated. The AGP configuration registers in the graphics 
controller 210 contain Information needed to configure AGP bus parameters for the AGP master. A Capabilities Pointer 
Register 2202 provides an offset pointer to the first function supported by this device in accordance with the New 

25 Capabilities mechanism as described by the PCI 2.1 Specification (reference; ECN defining "New Capabilities"). AGP 
is a function supported by the New Capabilities. Figure 22B illustrates the functional description of the bits used in the 
Capabilities Pointer Register 2202. 

[01 30] An AGP Capability Identifier Register 2204 identifies this function in the capabilities list to be the AGP function. 
Figure 22C illustrates the functional description of the bits used in the AGP Capability Identifier Register 2204. The 
30 AGP Capability Identifier Register 2204 also provides a pointer to the next function in the capabilities list and cites the 
AGP Specification revision number conformed to by this device. 

[0131] An AGP Status Register 2206 provides status of AGP functionality for this device. Figure 22D illustrates the 
functional description of the bits used in the AGP Status Register 2206. Information reported includes maximum request 
queue depth, sideband addressing capabilities, and transfer rates. This AGP status register is preferably a read only 
35 register. Writes have no affect and reserved or unimplemented fields return zero when read. 

[0132] An AGP Command Register 2208 allows software to enable sideband addressing, enable AGP. and set the 
AGP trarisfer rate. Figure 22E illustrates the functional description of the bits used in the AGP Command Register 
2208. These bits are set by the operating system during initialization. 

40 AGP Latency 

[0133] Intel's AGP Specification version 1 .0 does not specify latency for AGP cycles. For the purpose of disclosing 
the present invention, AGP latency is defined as the number of AGP bus clocks (66 MHz) occurring from the time a 
single request is enqueued until the first double word of data (tor the corresponding request) is presented to the AGP 

45 master. Latency begins when the request gets placed by the AGP master on either the AD or the SBA buses (depending 
upon which AGP addressing scheme is being used) and PIPE# or SBA (respectively) is active. Latency terminates 
when TRDY# is active and the first double word of data for the corresponding request is placed on the AD bus. Latency 
is defined only in terms of AGP read cycles because write cycles get posted in the core logic chipset 204. Figure 23 
illustrates expected latencies for best, typical, and worst cases. 

so [01 34] Best case latency may be computed by assuming a GART cache hit and a memory subsystem page hit while 
retrieving the targeted data (i.e. - no precharge). It also assumes that the AD bus is available, the request is aligned 
on a cache line, and the core logic chipset memory bus arbiter grants the AGP request access to the memory bus 
immediately CAS# latency used in the computation is 2 clocks. 

[01 35] Typical latency assumes the AD bus is available immediately the request is aligned on a cache line, a GART 
ss cache hit, and a memory subsystem page miss (i.e. - precharge and activate required). In this case, the AGP request 
must wait for a pending processor to memory or PCI bus to memory cycle to complete before being granted the memory 
bus by the arbiter. Precharge and activate penalties are included. CAS# latency used in the computation is 2 clocks. 
[0136] Worst case latency assumes the AD bus is available immediately, the request is aligned on a cache line 
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boundary, a GART cache miss (i.e., GART table entry lookup required), and a page miss (i.e., preoharge and activate 
required). In this case, the GART table entry lookup must wait for a pending processor to memory or PCI to memory 
cycle to complete before being granted the memory bus. Once the memory bus is available, the chipset perfornr^ the 
GART table entry read. The AGP request must then wait for another processor or PCI to memory cycle and a refresh 
5 cycle to complete before being granted the memory bus. Once access to the memory bus is granted, the AGP data is 
read from memory. Precharge and activate penalties are included. CAS# latency used in the computation is 2 clocks. 

Software Description 

10 [01 37] Key components of the AGP software architecture include System BIOS, the chipset miniport driver, the op- 
erating system, and the graphics or Direct Draw driver. These components are required to initialize and control the 
AGP and GART table functions within the chipset and graphics controller as illustrated in Figure 18A. The disclosure 
hereinafter discusses the principal AGP software components. It primarily examines both the system BIOS and the 
GART miniport driver. It briefly describes the operating system/API and the graphics controller driver as applied to AGP. 

15 

System BIOS 

[0138] During boot, System BIOS power-on self-test (POST) performs the following AGP functions: 1) Enables the 
core logic chipset's AGP error reporting and 2). May configure the core logic chipset with size of AGP device address 
20 space (optional). Each of these functions is described in more detail below. 

Enabling Error Reporting 

[0139] When the graphics controller attennpts to access a page in AGP memory that is hot valid, the chipset can 
2S either ignore the failure and continue processing or generate SERR#. Because this feature is platform specific, system 
BIOS is responsible for setting the appropriate registers (as opposed to GART miniport driver), tt configures the system 
to generate SERR# upon AGP failure using the following algorithm: 

1 . System BIOS first determines if AGP error reporting is supported by reading the chipset's V^lid Bit Error Reporting 
30 Supported bit (bit 0) in the AGP Capabilities register 1904 (see Figures 19A and 19C). When this bit is set to 1. 

the chipset is capable of generating SERR# when the graphics controller attempts to access an invalid page in 
AGP memory. 

If generating SERR# is supported, the chipset can enable SERR# generation by setting the Vfeilid Bit Error 
Reporting Enable bit (bit 0) in the AGP Feature Control register 1 906 to 1 (see Figures 1 9A and 1 9D). Setting this 
35 bit to 0 will cause the system to ignore the failure and continue processing the request. 

Configuring Size of AGP Device Address Space 

[0140] To reuse existing bus enumeration code and to optimize the amount of virtual and physical memory allocated 
40 to AGP, system BIOS can configure the read/write attributes in Base Address Register 0 (BARO) 1702 in the chipset's 
Host-PCI bridge configuration header (function 0) (see Figure 17A) prior to execution of the bus enumeration code; 
assuming the core logic chipset supports this feature. System BIOS uses the following algorithm to do this: 

1 . Prior to bus enumeration/memory mapping software, determine the make and model of the AGP graphics con- 
45 troller installed in the system. Based upon the graphics controller, BIOS can determine the amount of memory 

required by AGP. 

2. Using size obtained in step 1, set appropriate size in VAS Size bits (bits 2:1) of AGP Device Address Space 
Size register 1828 accessed in the chipset's PCI-PCI bridge configuration header (function 1) (see Figures ISA 

50 and IBM). When bits 2:1 are modified, the chipset will automatically adjust the read/write attributes in BARO 1702 

of the Host-PCI bridge configuration header (function 1 ) to reflect the amount of desired memory (see Figures 1 7A 
and17C). 

If no AGP device was found then set the AGP Valid bit in AGP Device Address Space Size register to 0 to indicate 
55 AGP is invalid. The chipset will automatically update BARO 1702 of the Host-PCI bridge configuration header to 

indicate no memory is required for AGP. The PCI-PCI bridge (function 1) capabilities pointer will be set to point to 
the next item in the linked list or null if there is no other item. 
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4. Bus enumeration code will find the requested size in BARO 1702 and aHoca^e (as required) this memory In the 
memory map. The base address of the block of AGP device address space will be placed in BARO 1 702 and will 
reside on a 32-MB boundary. 

5 [0141] Implementation of the AGP Device Address Space Size register is chipset specific. BIOS must know if its 
core logic chipset supports configuration of AGP device address space size. If not, then the AGP device address space 
size is hard coded in BARO 1702 of the Host-PCI bridge configuration header and no action is required by BIOS. 

GART Miniport Driver 

10 

[01 42] The GART miniport driver (hereinafter "GART MPD" or "MPD") of the present invention is used by the operating 
system software to perform the following lunctions: 

• Initializes GART capabilities within the chipset. 
IS m Creates and initializes the GART table. 

• Reserves GART table entries. 

• Maps GART table entries with allocated 4 KB pages in physical memory. 

• Flushes pages in the L1/L2 cache. 

• Unmaps GART table entries and maintains GART cache and link bit coherency 
20 • Frees GART table entries. 

• Terminates GART tianslation upon exit. 

[0143] Each of these lunctions is described in more detail below. Services provided by the GART miniport driver are 
illustrated in Figures 25A-25F. Services available to the GART miniport driver are illustrated in Figures 26A and 26B. 
25 For more information on those services reference is made to Microsoft's AGP Software Functional Specification. The 
Microsoft AGP Software Functional Specification is available from Microsoft Corporation, Redmond, Washington, and 
is hereby incorporated by reference. 

Initializing GART Capabilities 

30 

[0144] Upon receipt of the PCIMPInit() call from the operating system, the GART miniport driver (MPD) performs the 
following functions to initialize GART functionality in the chipset: 

1 . MPD reads the pointer to AGP Device Address Space from BAR 0 in the chipset's Host-PCI bridge configuration 
35 header. This pointer points to the base of AGP Device Address Space. The MPD stores this pointer. 

2. MPD reads the Device Address Space Size field (bits 2:1 ) from the chipset's AGP Device Address Space Size 
register located in the chipset's PCI -PCI bridge configuration header. This field provides the MPD with the amount 
of device address space allocated to AGP The MPD stores this value for later use. In a preferred embodiment of 

40 the present invention, this value may be 32 MB. 64 MB, 128 MB. 256 MB. 512 MB, 1 GB. or 2 GB. 

3. MPD gets pointer to AGP memory mapped control registers from Base Address Register 11(BARI • offset 14h) 
in the chipset's Host to PCI bridge configuration header. This pointer is stored for later use. The MPD also stores 
the location of the GART table Base Address Register This register resides at offset 04h in the GART table's 

45 memory mapped space. 

4. MPD gets pointer to AGP memory mapped control registers from Base Address Register 1 (BARI - offset 14h) 
in the chipset's Host to PCI bridge configuration header. Using this pointer. MPD enables the GART table cache 
by setting the GART Cache Enable bit (bit 3) in the AGP Feature Control Register (offset 02h from pointer) to a 1 . 

'50 It is now up to the GART MPD to maintain GART cache coherency. 

5. MPD gets pointer to AGP memory mapped control registers from Base Address Register 1 (BARI - offset 14h) 
in the chipset's Host to PCI bridge configuration header. Using this pointer. MPD reads the GART Entry Linking 
Supported bit (bit 1) in the AGP Capabilities register (offset 01 h from pointer) to determine if this chipset supports 

55 linking. If the chipset supports linking, the MPD sets the GART Entry Linking Enable bit (bit 1) in the AGP Feature 

Control register (offset 02h from pointer) to a 11 to enable the linking/prefetching function. It is now up to the MPD 
to set link bits as required. 
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Allocating and Initializing the GART Directory/Table 

[0145] Following AGP initialization and upon receipt of the PGIMPReset() call fronn the operating system, the chipset 
miniport driver (MPD) performs the following functions to (re)create and initialize the GART directory/table: 

5 

1 . MPD allocates "n" 4 KB pages of system memory for the GART table by calling the operating system using the 
PCI AllocatePagesQ command. The MPD must determine "n", how many pages to allocate based upon the number 
of pages of system memory available (provided by the operating system in the PGIMPResel call) and the amount 
of AGP device address space allocated by system BIOS (reference BARO in the chipset's Host-PCI bridge con- 
10 figuration header). Note that systems using two-leve! address translation must have an additional entry allocated 

for the GART directory. 

As disclosed above, the AGP implementation of the present invention supports two types of address translation: 
one-level address translation (page table) and two-level translation (directory ^ table page). In systems 
using a single-level address translation, the GART must be allocated as a single, contiguous block of memory. 
15 When using the PClAllocatePages() sen/ice, the MPD must set the PageContig flag to request contiguous pages 

from the operating system. Preferably the GART table memory allocation will be performed immediately following 
operating system startup to ensure that the required contiguous memory will be available. In systems using two- 
level address translation, the GART table need not be contiguous. 

20 [0146] The MPD sets the PageZerolnit flag in the PCIAIIocalePages() service so the operating system will fill the 
allocated pages with zeros; thus initializing the GART directory/table. 

[0147] To maintain L1/L2 cache coherency the MPD sets the MP_FLUSHES_L2_CACHE flag to indicate the oper- 
ating system should flush the L1 and L2 caches. 

25 2. In response to the PCI AIlocatePages() call, the operating system returns NULL if the request failed or the linear 

address of the GART table if the call was successful. This linear address is saved for future use by the MPD. The 
MPD must also convert this linear address to a physical address using the PCILinToDev() command. The MPD 
then gets the pointer to AGP memory mapped control registers from Base Address Register 1 (BAR1 - offset 1 4h) 
in the chipset's host to PCI bridge configuration header Using this pointer, MPD writes the base (physical) address 

30 for the first 4 KB page allocated to the AGP GART Base Address register (offset 04h from pointer) in the chipset's 

AGP memory mapped space. In systems using single-level translation, this first entry represents the base of the 
GART table. In systems using two-level translation, this first entry is the base of the GART directory. 

3. In systems using a two-level address translation, the MPD must "walk= the returned linear address range, de- 
35 termine the physical address of each 4 KB page just allocated, and write the physical address for the start of each 

4 KB page to its corresponding GART directory entry. This fills in the GART directory. 

Reserving GART Table Entries 

40 [0148] During run-time and upon receipt of the PCIMPResen^eEntries() call from the operating system, the chipset 
miniport driver (MPD) performs the following functions to reserve GART table entries for the operating system: 

1. The MPD searches the GART table to find "n" available contiguous entries; where "n" is the number of 4 KB 
pages requested by the operating system in the PCI MPResen/eEntries() call. Upon finding the contiguous entries, 

45 the MPD reserves these entries for the operating system by setting the valid bit (bit 0) in each GART table entry. 

2. The MPD then returns a map handle, which is the linear address of the first GART table entry reserved. This 
map handle is used later by the MPD to map and maintain GART table entries. Note that the map handle corre- 
sponds to the base address of the corresponding page in AGP device address space. 

50 

Mapping GART Table Entries 

[0149] After GART table entries have been reserved and upon receipt of the PCIMPMapEntries() call from the op- 
erating system, the chipset miniport driver (MPD) performs the following functions to map previously allocated 4 KB 
55 pages in physical memory with reserved GART table entries: 

1 . The MPD converts the system linear address provided by the PClMPMapEntries() call into a physical address 
using the FCILinToDevO command. The resulting address represents the base address of the particular 4KB page 
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in physical system memory. Note, the noncontiguous 4 KB pages in physical address space appear to the processor 
in system linear address space as contiguous. 

2. The MPD writes the resulting physical address to the particular GART table entry indexed by the map handle. 
5 This map handle is obtained while reserving GART table entries and is passed to the MPD by the operating system. 

The map handle is a linear address to the respective GART table entry. Since the pages reside on 4 KB boundaries, 
bits 31 : 1 2 are written to bits 31 :1 2 in the GART table entry. 

3. If linking is supported in the system, the link bit (bit 1 ) is set as required in the corresponding entry by the MPD. 
10 The link bit indicates that the next GART table entry is associated with the current GART table entry. When mapping 

"n" entries with linking enabled, the link bit should be set in entries 1 through n-1 . For example, when mapping 8 
entries as a result of the PCIMPMapEntries() call, it is assumed that all 8 entries are associated. Setting the link 
bit for entries 1 through 7 will allow entries 2 through 8 to be prefetched and cached in the GART table cache. 
Note, this assumes chipset burst memory accesses during GART table lookups. 

15 

4. Repeat steps 1 -3 "n" times; where "n" is the number of pages that need mapping. Note that the map handle and 
the system linear address must be incremented during each iteration. 

5. Upon completion of steps 1 -4, MPD gets a pointer to AGP memory mapped control registers from Base Address 
20 Register 11(BAR1 - offset 14h) in the chipset's Host to PCI bridge conHguratton header. Using this pointer. MPD 

flushes the chipsef s Host-Memory posted write buffers setting the Flush Posted Write Buffers bit (bit 0) in the 
Posted Write Buffer Control Register (offset 14h) to a 1 . This bit gets reset to 0 by the chipset upon completion. 
The MPD does not have to poll this bit to verify completion of the flush. Instead, it performs a read-back of the last 
entry that was written to the GART table. Completion of the flush is guaranteed before the data is returned from 
25 the read-back. 

Flushing L1/L2 Caches 

[0150] Immediately following mapping GART table entries and upon receipt of the PCIMPFIushPages() call from the 
30 operating system, the chipset miniport driver (MPD) performs the following functions to flush specific pages in the LI /L2 
caches: 

1. MPD flushes the LI cache using the processor's CR3 register. 

35 2. MPD flushes the specific pages from L2 cache, if possible. If the MPD is incapable of flushing a specific L2 

page, then it should not flush the entire L2 cache. Instead it should do nothing. 

Unmappinq GART Table Entries and Maintaining GART Cache and Link Bit Coherency 

40 [0151] During run-time and upon receipt of the PCIMPUnMapEntries() call from the operating system, the chipset 
miniport driver (MPD) performs the following functions to unmap GART table entries while maintaining GART cache 
coherency: 

1 . Using the map handle provided by the PCIMPUnMapEntries() call as a linear address into the GART table, the 
45 MPD initializes the indexed GART table entry (excluding valid bit) to some invalid state. The valid bit remains valid 

to indicate that this entry is still resen/ed for the application. 

2. If GART caching is enabled, the MPD must invalidate either the particular cached entry or the entire GART 
cache. To invalidate a particular GART cache line, the MPD writes the AGP device address to bits 31:12 of the 

50 GART Cache Entry Control register (offset 10h) and sets the GART Cache Entry Invalidate bit (bit 0) to a 11 in that 

same register The single GART cache entry will be invalidated. Upon completion, bit 0 will be reset to zero by the 
chipset. If the entry does not exist, the request is ignored. To invalidate the entire GART cache, the MPD writes a 
1 to the GART Cache Invalidate bit (bit 0) of the GART Cache Control register (offset OCh). The entire GART cache 
will be automatically invalidated. Upon completion, the Cache Invalidate bit will be reset to zero by the chipset. 

55 

[01 52) Invalidation of the entire GART cache preferably may be performed after all "n" GART table entries have been 
invalidated; where "n** is the number of GART table entries to free provided by the PCIMPFreeEntries() call. 
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3. If linking is enabled, the MPD must ensure that link bit coherency is maintained. For example, if GAFTT table 
entries 0, 1,2, and 3 exist with the link bit is set in entries 0, 1 , and 2, and entries 2 and 3 are freed, then the link 
bit in entry llmust be disabled. Failure to maintain link bit coherency will result in unnecessary caching of GART 
table entries. 

5 

4. Repeat steps 1 -3 "n" times; where 'n" is the number of GART table entries to free. This value is provided as an 
input parameter by the PCIMPFreeEntries() call. Note that the map handle nnust be incremented during each 
iteration. 

'0 5. Upon completion of steps 1 -4, MPD gets a pointer to AGP memory mapped control registers from Base Address 

Register 1 (BAR! - offset 14h) in the chipset's Host to PCI bridge configuration header. Using this pointer, MPD 
flushes the chipsets Host-Memory posted write buffers setting the Flush Posted Write Buffers bit (bit 0) in the 
Posted Write Buffer Control Register (offset 14h) to a 1 . This bit gets reset to 0 by the chipset upon completion. 
The MPD does not have to poll this bit to verify completion of the flush. Instead, it performs a read-back of the last 

'5 entry that was written to the GART table. Completion of the flush is guaranteed before the data is returned for the 

read-back. 

Freeing GART Table Entr ies 

20 [0153] Upon receip! of the PC!MPFreeEntries() caf! from the operating system, the chipset miniport driver (MPD) 
performs the following functions to tree GART table entries: 

1. Using the map handle provided by the PCIMPFreeEntries() call as a linear address to the GART table entry, 
the MPD sets the GART table entry's valid bit to invalid (0). This step is performed "n" times where "n" is the number 

25 of pages passed in the PClMPFreeEntries() call. 

2. Upon completion of step 1 . MPD gets pointer to AGP memory mapped control registers from Base Address 
Register 11 (BAR! - offset 14h) in the chipset's Host to PCi bridge configuration header Using this pointer, MPD 
flushes the chipset's Host-Memory posted write buffers setting the Flush Posted Write Buffers bit (bit 0) in the 

30 Posted Write Buffer Control Register (offset 14h) to a 1 , This bit gets reset to 0 by the chipset upon completion. 

The MPD does not have to poll this bit to verify completion of the flush. Instead, it performs a read-back of the last 
entr^' that was written to the GART table. Completion of the flush is guaranteed before the data is returned for the 
read-back. 

35 Terminating GART Table Functiona lity 

[0154] Upon receipt of the PCIMPExit() call from the operating system, the chipset miniport driver (MPD) performs 
the following functions to disable GART functionality: 

40 1. MPD flushes GART directory and table caches by writing a 1 to the GART Cache Invalidate bit (bit 0) of the 

GART Directory/Table Cache Control register (offset OCh). The entire GART cache will be automatically invalidated. 
Upon completion, the Cache Invalidate bit will be reset to zero by the chipset. 

2. MPD calls PCIFreePagesO to free pages allocated to GART table. The MPD must supply the linear address of 
45 the base of GART table and the number of pages to free. 

3. MPD initializes the freed pages by writing O's to all of the previously allocated GART table locations. 

[0155] AGP functionality preferably is disabled before terminating GART functionality. AGP functionality is disabled 
50 in the master before disabling AGP functionality in the target. 

Operating System 

[0156] The operating system performs the following AGP functions: 

55 

• Sets the data transfer rate in both master and target. 

o Enables sideband addressing in both master and target as required. 



24 



BNSDOCID: <EP 0902355A2_L> 



EP 0 902 355 A2 



• Sets request queue depth in master. 

• Enables AGP in target and master. 

5 • Allocates and frees physical mennory as required. 

• Perlornns read/write sen/ices for GART miniport driver. 

[0157] Reference is directed to Microsoft's AGP Software Functional Specification for nnore details regarding oper- 
10 ating system functionality as applied to AGP. 

Graphics Driver/Direct X 

[01 58) The graphics driver or Direct X performs the following AGP functions: 

IS 

• Reserves pages of AGP device memory for use by the application. 

• Commits pages of reserved device memory - thus allocating system memory. 

20 • Uncommiis pages of reserved device memory - thus deallocating system memory. 

• Frees previously reserved pages of AGP device memory. 

• Obtains information committed memory. 

2S 

[01 59] Reference is directed to Microsoft's AGP Software Functional Specification for nnore details regarding graphics 
driver and the Direct X driver functionality as applied to AGP. 

[0160] Reference is directed to Microsoft's AGP Software Functional Specification for nnore details regarding graphics 
driver and the Direct X driver functionality as applied to AGP 

30 [01 61] The present invention, therefore, is well adapted to carry out the objects and attain the ends and advantages 
mentioned, as well as others inherent therein. While the present invention has been depicted, described, and is defined 
by reference to particular preferred embodiments of the invention, such references do not imply a limitation on^the 
invention, and no such limitation is to be inferred. The invention is capable of considerable modification, alternation, 
and equivalents in form and function, as will occur to those ordinarily skilled in the pertinent arts. The depicted and 

35 described preferred embodiments of the invention are exemplary only, and are not exhaustive of the scope of Jhe 
invention. Consequently, the invention is intended to be limited only by the spirit and scope of the appended claims, 
giving full cognizance to equivalents in all respects. 



Claims 

1. A computer system, comprising: 

a system processor executing software instructions and generating graphics data; 
•^5 a system memory having an addressable memory space comprising a plurality of bytes of storage, wherein 

each of the plurality of bytes of storage has a unique address; 

the software instructions and the graphics data being stored in some of the plurality of bytes of storage of said 
system memory, wherein the graphics data is stored in a plurality of pages of graphics data, each of the plurality 
of pages of graphics data comprising a number of the plurality of bytes of storage; 
50 a graphics processor generating video display data from the graphics data and adapted for connection to a 

video display to display the video display data; 

a first interface logic for connecting said system processor to said system memory; 

a second interface logic for connecting said system processor and said system memory to said graphics proc- 
essor: 

55 said second interface logic having a cache memory and a cache entry control register; 

said cache memory having a plurality of storage locations, each of the plurality of storage locations comprising 

an address portion, an entry portbn, an entry update portion and an entry invalidate portion; 

a graphics address remapping table (GART table) having a plurality of entries, each of the plurality of GART 
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table entries comprising an acJdress pointer to a corresponding one of ttie plurality of pages of graphics data; 
and 

said second interface logic reading selected ones of the plurality of GART table entries and storing the selected 
ones in the entry portions of the plurality of storage locations of said cache nriemory, the storage locations 

5 being associated with graphics device addresses asserted by said graphics processor: and 

said cache entry control register adapted to receive infornnation for a graphics device address, an entry update 
and an entry invalidate from an applications programming interface (API) of the software instructions; wherein, 
if the received information through said cache entry control register causes the entry update portion to be set 
to a first logic level, said second interface logic will read the plurality of GART entries and update a one of the 

10 plurality of storage locations associated with the graphics device address received by said cache entry control 

register; and 

if the received information through said cache entry control register causes the entry invalidate portion to be 
set to the first logic level, said second interface logic will invalidate the one of the plurality of storage locations 
associated with the graphics device address received by said cache entry control register 

75 

2. The computer system of clainn 1 , further comprising: 

a cache entry update bit in said cache entry control register that can be set to the first logic level by the API 
writing to said cache entry control register and can be read by the API to determine if set to the first logic level 
20 or cleared to a second logic level, wherein setting the cache entry update bit to the first logic level by the API 

causes said second interface logic to update from the plurality of GART table entries stored in said system 
memory the one of the plurality of storage locations associated with the graphics device address received by 
said cache entry control register from the API; and 

said second interface logic clearing the cache entry update bit to the second logic level after updating the one 
25 of the plurality of storage locations associated with the graphics device address received by said cache entry 

control register 

3. The computer system of claim 1 , further comprising: 

30 a cache entry invalidate bit in said cache entry control register that can be set to the first logic level by the API 

writing to said cache entry control register and can be read by the API to determine if set to the first logic level 
or cleared to a second logic level, wherein setting the cache entry invalidate bit to the first logic level by the 
API causes said second interface logic to invalidate the one of the plurality of storage locations associated 
with the graphics device address received by said cache entry control register from the API; and 

35 said second interface logic clearing the cache entry invalidate bit to the second logic level after invalidating 

the one of the plurality of storage locations associated with the graphics device address received by said cache 
entry control register. 

4. The computer system of claim 1 , wherein the applications programming interface (API) is a GART miniport driver 

40 

5. The computer system of claim 1 . wherein said second interface logic uses the selected ones of the plurality of 
GART table entries stored in said cache memory to point to addresses of associated pages of a first portion of the 
graphics data stored in said system memory, the associated pages of the first portion of the graphics data being 
read by said graphics processor to generate the video display data. 

45 

6. The computer system of claim 5, further comprising a local frame buffer memory connected to said graphics proc- 
essor, said local frame buffer storing a second portion of the graphics data from said system memory. 

7. The computer system of claim 5, wherein the associated pages of the first portion of the graphics data are stored 
so in random non-contiguous pages of the plurality of pages of graphics data. 

8. The computer system of claim 6, wherein said local frame buffer memory stores the second portion of the graphics 
data in contiguous virtual address space and said graphics processor accesses the first portion of the graphics 
data in contiguous virtual address space by using the selected ones of the plurality of GART table entries stored 

55 in said cache memory and accesses the second portion of the graphics data from said local frame buffer memory. 

9. The computer system of claim 8, wherein said graphics processor reads the first and second portions of the graphics 
data in contiguous virtual address space. 
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10. The computer system of claim 1 . wherein the first logic level is a logic Hand the second logic level is a logic 0. 

11. The computer system of claim 1 . wherein the first logic level is a logic 0 and the second logic level is a logic 1 . 

12. The computer system of claim 1 . further comprising a third interface logic for connecting said system processor 
and said system memory to input-output devices. 

13. The computer system of claim 1 . further comprising a fourth interface logic for connecting said system processor 
and said system memory to storage devices. 

14. The computer system of claim 1 , wherein the plurality of GART table entries are stored in said system memory. 

15. The computer system of claim 1 , wherein the plurality of GART table entries are stored in a plurality of pages of 
GART table entries in said system memory. 

16. The computer system of claim 1 5, wherein the plurality of pages of GART table entries are stored in said system 
memory in a non -contiguous and random order 

17. The computer system of claim 16, further comprising a GART directory having a plurality of entries, each of the 
plurality of GART directory entries comprising an address pointer to a corresponding one of the plurality of pages 
of GART table entries, wherein said second interface logic uses the plurality of GART directory entries for locating 
the plurality of pages of GART table entries in said system memory. 

18- The computer system of claim 1 , wherein the number of the plurality of bytes of storage in each of the plurality of 
pages of graphics data is 4096 bytes. 

19. The computer system of claim 1 , further comprising a video display. 

20. A computer system having a core logic chipset which connects a central processing unit and random access 
memory to an accelerated graphics port (AGP) bus, said system comprising: 

a central processing unit connected to a host bus; 

a random access memory connected to a random access memory bus; 

a core logic chipset connected to the host bus and the random access memory bus; 

said core logic chipset having a first interface bridge for connecting the host bus to the random access memory 

bus; 

said core logic chipset having a second interface bridge for connecting the host bus to an accelerated graphics 
port (AGP) bus; 

said core logic chipset having a third interface bndge for connecting the random access memory bus to the 
AGP bus; - 

said core logic chipset having a cache memory and a cache entry control register; 

said cache memory having a plurality of storage locations, each of the plurality of storage locations comprising 
an address portion, an entry portion, an entry update portion and an entry invalidate portion; 
said core logic chipset using a graphics address remapping table (GART table) having a plurality of entries, 
each of the plurality of GART table entries comprising an address pointer to a corresponding one of a plurality 
of pages of graphics data stored in said random access memory; 

said core logic chipset reading selected ones of the plurality of GART table entries stored in said random 
access memory and storing the selected ones of the plurality of GART table entries in the entry portions of 
the plurality of storage locations of said cache memory, each of the entry poitbns associated with a one of 
the address portions; and 

said cache entry control register adapted to receive information for a graphics device address, an entry update 
and an entry invalidate; wherein, 

if the received information through said cache entry control register causes the entry update portion to be set 
to a first logic level, said core logic chipset will read the plurality of GART entries and update a one of the 
plurality of storage locations associated with the graphics device address received by said cache entry control 
register; and 

if the received infornnation through said cache entry control register causes the entry invalidate portion to be 
set to the first logic level, said core logic chipset will invalidate the one of the plurality of storage locations 
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asscK:iated with the graphics device address received by said cache entry control register 

21. The computer system of claim 20. wherein the central processing unit is a plurality of central processing units. 

5 22. The computer system of claim 20, wherein the plurality of pages of graphics data are stored in said random access 
memory in a non-contiguous and random order. 

23. The computer system of claim 22, wherein each one of the plurality of GART table entries comprises a plurality of 
binary bits and each one of the plurality of pages of graphics data is associated with the each one of the plurality 

10 of GART table entries'such that a first number of most significant bits of the plurality of binary bits comprise a base 

address of the associated each one of the plurality of pages of graphics data. 

24. The computer system of claim 20, further comprising; 

15 a cache entry update bit in said cache entry control register that can be set to the first logic level and can be 

read to determine if set to the first logic level or cleared to a second logic level, wherein setting the cache entry 
update bit to the first logic level causes said core logic chipset to update from the plurality of GART table 
entries stored In said system memory the one of the plurality of storage locations associated with the graphics 
device address received by said cache entry control register; and 

20 said core logic chipset clearing the cache entry update bit to the second logic level after updating the one of 

the plurality of storage locations associated with the device address received by said cache entry control 
register. 

25. The computer system of claim 20, further comprising: 

2S 

a cache entry invalidate bit in said cache entry control register that can be set to the first logic level and can 
be read to determine if set to the first logic level or cleared to a second logic level, wherein setting the cache 
entry invalidate bit to the first logic level causes said core logic chipset to invalidate the one of the plurality of 
storage locations associated with the graphics device address received by said cache entry control register; 
30 and 

said core logic chipset clearing the cache entry invalidate bit to the second logic level after invalidating the 
one of the plurality of storage locations associated with the graphics device address received by said cache 
entry control register 

35 26. The computer system of claim 20, wherein the plurality of GART table entries are stored in at least one page of 
said random access memory. 

27. The computer system of claim 20, wherein each of the plurality of pages of graphics data is 4096 bytes. 

40 28. The computer system of claim 20, wherein said core logic chipset is at least one integrated circuit. 

29. The computer system of claim 28, wherein said at least one integrated circuit core logic chipset is at least one 
application specific integrated circuit. 

45 30. The computer system of claim 28, wherein said at least one integrated circuit core logic chipset is at least one 
programmable logic array integrated circuit. 

31. The computer system of claim 20, wherein said central processing unit executes software instructions and gener- 
ates the graphics data. 

50 

32. The computer system of claim 20, further comprising a graphics processor for generating video display data based 
upon the graphics data. 

33. The computer system of claim 32, further comprising a local frame buffer memory coupled to said graphics proc- 
55 essor, wherein said graphics processor combines video data stored in said local frame buffer memory with the 

associated ones of the plurality of pages of graphics data read from said random access memory based upon the 
selected ones of the plurality of GART table entries stored in said cache memory to generate video display data. 
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34, The computer system of claim 20. further comprising said core logic chipset having a fourth interface bridge for 
connecting the host bus to a peripheral component interconnect (PCI) bus adapted for connection to input-output 
devices. 

5 35. The computer system of claim 34, further comprising said core logic chipset having a fifth interface bridge for 
connecting the random access memory bus to the PCI bus. 

36. The computer system of claim 34. further comprising said core logic chipset having a sixth interface bridge for 
connecting the AGP bus to the PCI bus. 

37. The computer system of claim 20, further comprising a network interface card, a hard disk, a floppy disk drive, a 
modem, a keyboard and a mouse. 

38. The computer system of claim 20, further comprising a serial port, a parallel port and a real time clock. 

15 

39. The computer system of claim 20, further comprising a read only memory basic input-output system (ROM BIOS), 
a non-volatile random access memory (NVRAM), a tape drive and a CD ROM drive. 

40. A method, in a computer system, of updating and invalidating individual selected ones of a plurality of graphics 
20 address remapping table (GART table) entries stored in a cache memory, said method comprising the steps of: 

storing a plurality of pages of graphics data in any order in a computer system memory; 

storing a plurality of entries of a graphics address remapping table (GART table) in the computer system 

memory, wherein each one of the plurality of GART table entries corresponds to a one of the plurality of pages 

25 of graphics data stored in the computer system memory; 

reading selected ones of the plurality of GART table entries stored in the computer system memory; 
storing the selected ones read from the computer system memory into a cache memory, wherein the cache 
memory has a plurality of storage locations, each of the plurality of storage locations comprising a graphics 
device address portion, an entry portion, an entry update portion and an entry invalidate portion, wherein the 

30 selected ones are stored in the entry portions; 

writing a first logic level to the entry update portion of a one of the plurality of storage locations when an 
associated one of the selected ones requires updating from the computer system memory; and 
writing the first logic level to the entry invalidate portion of a one of the plurality of storage locations when an 
associated one of the selected ones is invalid. 

35 

41. The method of claim 40. further comprising the steps of: 

reading the entry update portions of the plurality of storage locations of the cache memory; 
reading a new selected one of the plurality of GART table entries in the computer system memory for each of 
40 the entry update portions containing the first logic level; 

storing the new selected one in the entry portion associated with each of the entry update portions containing 
the first logic level; and 

resetting each of the entry update portions to a second logic level after storing the new selected one in the 
entry portion. 

45 

42. The method of claim 40, further comprising the steps of: 

reading the enlry invalidate portions of the plurality of storage locations of the cache memory; 
invalidating the storage location associated with each of the entry invalidate portions containing the first logic 
50 level; and 

resetting each of the entry invalidate portions to a second logic level after invalidating the storage location 
associated therewith. 

43. The method of claim 40. further comprising the step of reading associated ones of the plurality of pages of graphics 
55 data in an order determined by the selected ones of the plurality of GART table entries stored in the cache memory 

44. The method of claim 40, wherein a system memory address is determined for each byte of graphics data stored 
in the plurality of pages" of graphics data by a base address stored in the associated one of the plurality of GART 
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table entries and an offset address added to the base address. 

45. The method of claim 40. further connprising the step of allocating mennory locations in the computer system for 
storing the plurality of GART table entries during initialization of the computer system. 

5 

46. The method of claim 41 , further comprising the step of writing to a cache entry control register a graphics device 
address of a selected one of the plurality of GART table entries stored in the cache memory and the first logic level 
to a cache entry update bit of the cache entry control register, wherein the first logic level is written to the entry 
update portion of the one of the plurality of storage locations associated with the graphics device address. 

10 

47. The method of claim 46, further comprising the step of clearing the cache entry update bit from the first logic level 
to a second logic level after storing the associated new selected one. 

48. The method of claim 42. further comprising the step of writing to a cache entry control register a graphics device 
is address of a selected one of the plurality of GART table entries stored in the cache memory and the first logic level 

to a cache entry invalidate bit of the cache entry control register, wherein the first logic level is written to the entry 
invalidate portion of the one of the plurality of storage locations associated with the graphics device address. 

49. The method of claim 48, further comprising the step of clearing the cache entry invalidate bit from the first logic 
20 level to a second logic level after invalidating the storage location associated therewith. 

50. A core logic chipset adapted for connection to a computer central processing unit and random access memory 
an accelerated graphics port (AGP) bus and a peripheral component interconnect (PCI) bus, comprising: 

25 an accelerated graphics port (AGP) request queue; 

an AGP reply queue; 
an AGP data and control logic; 

said AGP data and control logic having an AGP cache entry control register; 
an AGP cache memory; 

30 said AGP cache memory having a plurality of storage locations, each of the plurality of storage locations 

comprising a graphics device address portion, an entry portion, an entry update portion and an entry invalidate 
portion; 

an AGP arbiter; 

a host to peripheral component interconnect (PCI) bridge; 
35 a PCI to PCI bridge; 

a memory interface and control logic adapted for connecting to a computer system random access memory; 
and 

a host bus interface adapted for connecting to a computer system host bus having at least one central process- 
ing unit connected thereto; wherein, 
40 said AGP request and reply queues are connected to said mennory interface and control logic; 

said AGP data and control logic is connected to said memory and interface control logic: 
said AGP data and control logic is connected to the host bus interface; 

said host to PCI bus bridge is connected to the host bus interface and is adapted for connection to a PCI bus; 
said PCI to PCI bridge is connected to said AGP data and control logic, wherein said PCI to PCI bridge transfers 
45 PCI information transactions between said Host to PCI bus bridge and said AGP data and control logic; 

said AGP data and control logic and said AGP arbiter adapted for connection to an AGP bus having an AGP 
device; wherein 

said AGP data and control logic is adapted to use a graphics address remapping table (GART table) having 
a plurality of entries, each of the plurality of GART table entries comprising an address pointer to a one of a 

so plurality of pages of graphics data stored in the computer system random access memory; 

said AGP data and control logic is adapted to read selected ones of the plurality of GART table entries stored 
in said random access memory and is adapted to store the selected ones of the plurality of GART table entries 
in the entry portions of the plurality of storage locations of said cache memory, each of the entry portions 
associated with a one of the graphics device address portions; and 

55 said cache entry control register adapted to receive information for a graphics device address, an entry update 

and an entry invalidate; wherein, 

if the entry update of the received information causes the entry update portbn to be set to a first logic level, 
said AGP data and control logic Is adapted to update from the GART table entries stored in said random access 
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memory the one of the plurality of storage locations associated with the graphics device address; and 
if the entry invalidate of the received information causes the entry invalidate portion to be set to the first logic 
level, said AGP data and control logic is adapted to invalidate the one of the plurality of storage locations 
associated with the graphics device address. 

51. The core logic chipset of claim 50, further comprising: 

a cache entry update bit in said cache entry control register that can be set to the first logic level and can be 
read to determine if set to the first logic level or cleared to a second logic level, wherein setting the cache entry 
update bit to the first logic level causes said AGP data and control logic to update from the G ART table entries 
stored in said system memory the one of the plurality of storage locations associated with the graphics device 
address received by said cache entry control register; and 

said AGP data and control logic clearing the cache entry update bit to the second logic level after updating 
the one of the plurality of storage locations associated with the graphics device address received by said cache 
entry control register. 

52. The core logic chipset of claim 50, further comprising: 

a cache entry invalidate bit in said cache entry control register that can be set to the first logic level and can 
be read to determine if set to the first logic level or cleared to a second logic level, wherein setting the cache 
entry invalidate bit to the first logic level causes said AGP data and control logic to invalidate the one of the 
plurality of storage locations associated with the graphics device address received by said cache entry control 
register; and 

said AGP data and control logic clearing the cache entry invalidate bit to the second logic level after invalidating 
the one of the plurality of storage locations associated with the graphics device address received by said cache 
entry control register 
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(54) System and method for invalidating and updating individual gart (graphic address 
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(57) A computer system having a core logic chipset 
that functions as a bridge between an Accelerated 
Graphics Port ("AGP") bus device such as a graphics 
controller and a host processor and computer system 
memory wherein a Graphics Address Remapping Table 
("GART table") is used by the core logic chipset to remap 
virtual memory addresses used by the AGP graphics 
controller into physical memory addresses that reside 
in the computer system memory. The GART table ena- 
bles the AGP graphics controller to work in contiguous 
virtual memory address space, but actually use non- 
contiguous blocks or pages o1 physical system memory 
to store textures, command lists and the like. The GART 
table is made up of a plurality of entries, each entry com- 
prising an address pointer to a base address of a page 
of graphics data in memory, and feature flags that may 
be used to customize the associated page. The core log- 
ic chipset may cache a subset of the most recently used 
GART table entries to increase AGP performance when 
performing the address translation . A GART cache entry 
control register is used by an application programming 
interface, such as a GART miniport driver, to indicate to 
the core logic chipset that an individual GART table en- 
try in the chipset cache should be invalidated and/or up- 
dated. The core logic chipset may then perform the re- 
quired invalidate and/or update operation on the individ- 
ual GART table entry without having to flush or other- 
wise disturb the other still relevant GART table entries 
stored in the cache. 
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